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The present invention is directed to compositions and methods comprising peptides which have a high affinity for each other and 
when lmked to a protein, are used to help fold the protein into a compact structure. By virtue of its stability and constraints this scaffold 
can prolong the activity of any embedded protein sequences in the presence of cellular and other proteases. The compact 'structure can 
have other functional sequences embedded, and is preferable to linear and less constrained peptides for library screening for creating 
structurally-biased peptide libraries and for targeting to specific intracellular and extracellular compartments. Compositions of the present 
invention can be displayed on the surface of viruses, archaebacteria, prokaryotic and eukaryotic cells for library screening, drug screening 
and display Methods of the present invention are useful for screening in vivo for intracellular effector proteins modulating signaling 
pathways and to identify interacting proteins in vitro. Thus, the present invention is useful as a scaffold for gene therapy, for the isolation 
of new therapeutic drug leads and for potential use as a therapeutic in physiological fluids 
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PEPTIDES CAUSING FORMATION OF COMPACT STRUCTURES 

Field of the Invention 

The compositions and methods of the invention re.ate to the use of dimerization peptides that self- 
assocate and their use with other proteins to effect the formation of compact structures. 

Background of the Invention 

Pr^s interact with each Cher tergety through contormationally constrained domains Aithough tinear 
pepbdes w,,h tree,, rotating amino and carboxy, termini oan have poten, functions as is Known in the ad 
the conversron of such peptide structures into pharmacologic agents is fre,uen«y difficult Therefor. ,he 
presentatron of peptides in conformational,, constrained sbuctores can resu,, in the generation o, 
Pharmaceutics with high affinity to its farge, protein Constrained peptides have many valuable features 
compared to their linear analogs. These include: (i, entranced stability to protects [Szewczuk e, a, 
Btochem.sby 31:9132-9,40 ,,992,, due to the lack of unconstrained N- or C- term.na, amino acid " 
residues access,b,e to amine or carboxypephdases and a non-extended srructore ^diminishes 
endopep daae suscepMUy;,™, a restricted conformation space «ba, oan resut, in a higher binding affinity 
to gnate b,nd,ng proteins due to a reduced emropic cos, o. binding ,Hruby. Life SC. 
1982 R, Z o and Gierasch. Ann Rev. Biochem. 61:387-418 <1992),; ( iii) me geomebv to mimic reverse 
urns, loops o, other secondary structures (Rose e, a, . Adv Protein Chen,, 37 ,-109 ,1985): stradley e, 

l ra T„ 263 " 287 <1990,; R ' 20 M * a ~ C °°'~ - **** *~s <P 

TZr? S ; aTOSeSha "- ^ »-> A — V of Sciences Publicans (Bangalore, india, p469- 

d!!e,r p men, a " ^""^^ reS,ri « d "***> «"*>> °»°™ easier pharmaophore and drug 

Thus constrained peptides can torm the bas.s for the isotation of new ligands and receptors and 
subsequently for the rattona, design of small molecules which may be useful as drugs The desirabiilty of 
this approach was shown using cyciic peptide tibranes which have been used to discover and refine 
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potent ligands of a variety of receptors [O'Neil et al., Proteins. Structure Function and Genetics 14.509- 
515 (1992); Giebel et a!., Biochem. 34:15430-35 (1995); Spatola and Crozet. J. Med. Chem. 39:3842-46 
(1996); Koivunen et al., J. Biol. Chem. 268:20205-10 (1993); Koivunen et al., J. Cell. Biol. 124:373-380 
(1994)], enzymes [McBride et al., J. Mol. Biol 259.819-27 (1996); Eichler et al., Mol. Divers. 1:233-240 
5 (1996)], and other proteins [Wang et al., J. Biol. Chem. 270:23239-42 (1995)]. 

Several constrained protein scaffolds, capable of presenting a protein of interest as a conformationally- 
restricted domain are described in the literature and include minibody structures (Bianchi et al., J. Mol. 
Biol. 236(2):649-59 (1994), loops on beta-sheet turns, coiled-coil stem structures (Myszka and Chaiken, 
Biochemistry 33:2363-2372 (1994), zinc-finger domains, cysteine-linked (disulfide) structures, 
10 transglutaminase linked structures, cyclic peptides, helical barrels or bundles, leucine zipper motifs 
(Martin et al., EMBO J. 13(22):5303-5309 (1994); O'Shea etal., Science 243:538-42 (1993), etc. 

In addition, self-aggregation has been described for regulatory peptides such as the neuropeptide head 
activator [as further outlined below; Bodenmuller et al., EMBO J. 5(8): 1825-1 829 (1986)], substance P 
[Poujade et al., Biochem. Biophys. Res. Commun. 114:1109-1116 (1983)]. metenkephaiin [Mastropaolo et 
15 al., Biochem. Biophys. Res. Commun. 134:698-703 (1986)], and neuropeptide Y [Minakata et al., J. Biol. 
Chem. 264:7907-7913 (1989)]. 

Pertinent to the subject of this invention is a peptide derived from the neuropeptide head activator (HA) 
isolated from the freshwater coelenterate Hydra (Bodenmuller et al., supra). Bodenmuller et al. 
demonstrated that under physiological conditions the HA peptide (pEPPGGSKVILF) dimerizes to form a 
20 biologically inactive molecule. 

Dimerization of the monomer form yields a stable structure, which does not dissociate into its monomeric 
components at concentrations as low as 10" 13 M. Further analysis of HA fragments revealed that a 
fragment containing only the last six amino acid residues from the carboxy terminus of the HA peptide 
(pSKVILF) dimerized more efficiently that HA itself. However, a fragment containing only the last 4 amino 
25 acid residues (pVILF) and a fragment derived from the amino-terminal end of HA (pEPPGGSK) did not 
lead to dimer formation. Most importantly, their analysis showed that both the replacement of the 
carboxy-terminal phenylalanine and a modification thereof (e.g., introduction of an iodine in the para (4) 
position of the aromatic ring) abolished dimerization completely or decreased dimerization tendency 
drastically 

30 Aldwin et al. (US 5,491 ,074), referring to SKVILF as 'association peptide', added additional amino acid 
residues at either its amino terminal sequence or to its carboxy-terminus and found that some of the 
resulting proteins could form dimeric peptides. However, Aldwin et al. did not demonstrate or anticipate 
the addition of more than one association peptide' to one polypetide of interest. 
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Accordingly, it is an object of the invention to prov.de dimerization peptides for use in a variety of 
applications. 

SUMMARY OF THE INVENTION 

Peptides which have a moderate or high affinity for each other, when added as extensions to both the N- 
5 and C-terminus of a protein, can be used to help fold the protein into a compact structure. Compared to 
cognate linear proteins and disu.fide-cyc.ized proteins, this new compact structure is more stable to 
ce..u,ar and other proteases, and is significantly more conformationa.ly constrained than the linear 
pepfides. The compact structure can have other functional sequences embedded within its sequence 
and ,s preferable to linear and .ess constrained peptides for intracellu.ar and extracel.ular library screens 
10 andfortargetingtospecificintraceHulanocafion, It can be used, with appropriate flanking residues on ' 
each end of the varied rescues in a random peptide sequence, to create structura.ly-biased peptide 
..braries. By virtue of its stability and constraints, this scaffold can prolong the activity of any embedded 
peptide sequences in the presence of proteases. 

Peptides having me property o, self-aggregating herein are referred to as dimenzafa, peptides (DP) 
15 The dimenza,ion pepfdes of this invention compose the sequence FUVK (from amino-terminal to 

carboxy.arnrina,,. Examples., dimedzahon sequences which enhance the foiding o, a prorein o, inreres, 
include, but are not limited to. FUVK. EFLIVKS, KFVLIKS. VSIKFEL. LIVKS, EFUVK KFLIVK 
FESIKVL. and LKSIVEF These dimerization pepfides (DP) can be used in several combinations ,o yield 
protans of the genera, sfructere 'DP-protein' or DP-protein-DP^ wherein DP' is a dimerizafion pepfide 
prr>,e,n comprises „ leas, two amino acid residues m add.fion other amino acid sequences including 
bu, no, „m,ted ,o. KnKer sequences, teg sequences. ,arge„ng sequences and stebilizalion sequences are 
generally included. 

Other sequences .nclude those with a high content of hydrophobic am.no acids and 1 or 2 charged 

25 TTT ChamS Genera " y ' " SeqUSnCe ^ term ' nUS ° f ^ —Posed of 5 

25 6, 7 and 8 amino acids with at least 3-4 high.y hydrophobic residues (taken from F, I L M VW and Y) ' 
will function in this fashion. ' ' ' 

The composifions of this invention are displayed intracel.ularly or extracellulary and are usefu. to identify 
bmd.ng proteins and molecules and to modulate intraceHular signaling pathways. In one aspect of the 
.nvent.on. a library of constra.ned prote.ns is evaluated in vivo for its bioactive potential Thus the 
-nvention accesses molecules or targets within living cells and provides forthe .solation of the constrained 
prote-n which has a phenotypic effect on this living cel.. Th,s method comprises the steps of a) 
■ntroducng a library encoding constra.ned prote.ns into a plurality of cells; and b) screen.ng the plurality of 
ce„s for an altered phenotype. conferred upon the cel. by a member of the library. The methods may also 
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include the steps of c) isolating cell(s) exhibiting an altered phenotype and d) isolating the member of the 
library which caused an altered phenotype. 

In another aspect, the compositions of the invention are useful to identify in vitro binding proteins and 
other small molecules capable of binding to the constrained protein. This method comprises the steps of 
5 a) providing a constrained protein of interest; b) binding the constrained protein of interest to a solid 
support; c) providing a molecular library comprising a plurality of individual members; and d) providing 
conditions allowing the individual members to bind to the constrained protein of interest. The method may 
also include the steps of e) isolating the bound library member. 

In another aspect, the invention provides for the construction of molecular libraries comprising a plurality 
1 0 of constrained proteins. This library of constrained proteins is used in vitro binding assays to identify 
individual members capable of binding to a protein of interest. This method comprises the steps of a) 
providing a protein of interest; b) binding the protein of interest to a solid support; c) providing a molecular 
library comprising a plurality of constrained proteins; and d) providing conditions allowing the constrained 
proteins to bind to the protein of interest. The method may also include the steps of e) isolating the bound 
15 constrained protein. 

The compositions of the invention are thus useful as a scaffold for gene therapy and for potential use as a 
therapeutic in physiological fluids. 

In an additional aspect of the invention, the constrained peptides are linked to fusion partners or are 
targeted to specific subcellular compartments. 

20 The present invention also provides molecular libraries encoding constrained proteins, comprising 
plasmids and retroviral components and host cells comprising these molecular libraries. 

BRIEF DESCRIPTION OF THE FIGURES 

Figures 1A, 1B, 1C. 1D. 1E, 1F. and 1G depict schematic drawings of some embodiments of DP-protein 
structures. Fig. 1 A. Two dimerization peptides (DP) are fused to a linear protein (P), which results in a 

25 DP-protein structure (shown here as DP-protein-DP), which may fold into a compact structure due to the 
dimerization of DP. Fig 1B. DP-protein structures comprising a linker (L). Fig 1C. DP-protein structures 
comprising a tag sequence (Tag) Tag, and Tag 2 are two different tags fused to one DP-protein. 
indicating that many comb.nations of fusing tags to the DP-protein are possible. Fig 1D. DP-protein with 
linkers in between DP and P and two different tags. Fig 1E. DP-protein, wherein a dimerization peptide 

30 (DP,) added to the N-terminus of P is different from a dimerization peptide (DP 2 ) added to the C-terminus 
of P. Fig 1F. DP-protein comprising stability sequences such as MG at its N-terminus and GGPP at its C- 
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terminus. Fig 1G. DP-proteins, wherein multiple proteins P 1t P 2 , and P 3 are fused to dimerization 
peptides. 

Figures 2A, 2B, and 2C depict schematic drawings of complex DP-proteins. Fig. 2A Covalently 
associated double-loop structure. Due to the specific dimerization of DP hyd :DP hyd and DP Lys :DP Glu , two 
5 constrained peptides are formed within one DP-protein and a double loop structure is expected The two 
loop structures are covalently linked through a flexible glycine linker. Fig 2B. Non-covalently associated 
double-loop structure. Two DP-proteins, one comprising P„ the other comprising P 2 are made, each 
resulting in a compact structure due to the dimerization of DP hyt) :DP hyd . When combined, due to the 
specific dimerization of and DP Lys :DP Glu . the two constrained structures associate yielding a double loop 

10 structure. The two dimerization peptides DP, yd and DP Lys or DP nyd and DP Glu are connected through a 
flex,ble glycine linker. Fig 2C. Non-covalently associated double-loop structure, wherein unconstrained 
proteins P, and P 2 are forced into a compact structure due to the specific dimerization of DP hyd :DP hyd and 
DP Lys :DP Glu . The dimerization peptides which associate are confined to different DP-proteins however 
associate with one another when the two DP-proteins are combined. Figs. 2A-C. DP hyd is a dimerization 

15 pept.de comprising mostly hydrophobic amino acids; DP Lys is a dimerization peptide comprising mostly 
lys>nes; DP G|U is a dimerization peptide comprising mostly glutamic acids; L P is a linker comprising 
prolines, L G is a linker comprising glycines; P, and P 2 are proteins, which may or may not be the same. 

Figures 3A and 3B show that novel peptides form observable dimers. Fig. 3A. Dimerization of SKVILFE- 
amide and EFLIVKS-amide. Fig 3B. Dimerization of EFLIVKS-amide when eluted from a C18 reversed 
20 phase column at pH -2.5 in ca. 25% acetonitrile. 

Figure 4 shows LC/MS examination of the crude synthesis products from an all-single coupled fmoc 
synthesis of EFL.VKS-am.de, for shorter sequences which can dimerize after e.ectrospray ionization. 

Figures 5A, 5B, and 5C show proteolytica.ly resistant structures. Fig 5A. Elastase digestion products of 
the 18mer test protein sequence CGTIVTMEYRIDRTRSFC. Fig. 5B. Elastase digestion products of the 
25 18mer test protein sequence CGTIVTMEYRIDRTRSFC with d.su.fide bonds between the two underlined 
cyste.nes. Fig. 5C. Elastase digest.on products of EFLIVKS-VGTIVTMEYRIDRTRSFV-EFLIVKS Figs 
5A.-C. Proteolytic fragments are mon.tored by reversed phase hplc coupled to mass spectrometry 
detection and identified 

Figure 6. Overlay of the 45 lowest energy structures (only the pept.de backbone is shown) of EFLIVKS- 
30 VGTIVTMEYRIDRTRSFV-EFLIVKS. 

DETAILED DESCRIPTION OF THE INVENTION 
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Cyclic or otherwise constrained peptides have many valuable features compared to their linear analogs, 
including enhanced stability to proteolysis and a restricted conformation space that can result in a higher 
binding affinity for cognate binding proteins due to a reduced entropic cost of binding. These constrained 
peptides can form the basis for the subsequent design of small molecules which may be useful as drugs. 
Constrained peptides contained in minimized proteins may also be useful as an intermediate step in the 
design of agents blocking protein-protein interactions [Cunningham and Wells, Curr. Opin. Struct. Biol. 
7:457-462 (1997)], incorporated herein by reference, which may offer a novel method of regulating 
intracellular signaling pathways. When peptides are intracellular^ expressed, they may modulate 
intracellular signaling pathways [Souroujon and Mochly-Rosen, Nat. Biotechnol. 16(10):919-24 (1998)]. If 
the peptides are expressed in live mammalian cells, they may be screened for defined changes in cellular 
phenotype, and the resulting bioactive peptides may provide a route for the affinity isolation of their 
binding targets. 

Accordingly, the present invention provides dimerization peptides. By "dimerization peptide", "DP" or 
"association peptide" or grammatical equivalents herein is meant a peptide which either self-aggregates 
or dimerizes or associates with a second peptide. 

By "self-aggregates", or "dimerizes" or "associates" herein is meant that a peptide has an affinity for 
another peptide and non-covalently attaches itself to this peptide. The interaction between two molecules 
(e.g. two peptides) that are capable of binding to one another is usually characterized in terms of the 
strength with which these molecules interact, i.e., the "affinity" that the molecules have for one another. 
The range of measured affinity constants, for example, for antibody-antigen binding extends from 10 5 liter 
mol" 1 to above 10 12 liter mol 1 (Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor 
Laboratory Press, 1988) For comparison, the affinity of trypsin for its substrate is approximately 1 .25 x 
10 4 liter mol' 1 and the affinity of lambda repressor for DNA is 10 10 liter mol" (Harlow and Lane, supra). 
Dimerization peptides provided by this invention usually have affinities for one another in the range from 
25 about 10 5 liter mol 1 to about 10 13 liter mol 1 , more usually from about 10 6 liter mol 1 to about 10 13 liter mol 1 , 
from about 10 7 liter mol 1 to about 10 13 liter mol* 1 being preferred, from about 10 8 liter mol 1 to about 10 13 
liter mol' 1 being more preferred, from about 10 9 liter mol 1 to about 10 13 liter mol" 1 being mostly preferred, 
and from about 10 10 liter mol' 1 to about 10 13 liter mol" 1 being especially preferred. As is known to those in 
the art, measurement of affinity constants is affected by temperature, pH , and solvent. 

30 By "peptide" herein is meant a compound which comprises at least two covalently attached amino acids 
and includes proteins, polypeptides, oligopeptides and peptides. The peptide may be made up of 
naturally occurring amino acids and peptide bonds, or synthetic peptidomimetic structures. Thus "amino 
acid", or "amino acid residue", or "peptide residue", as used herein means both naturally occurring and 
synthetic amino acids. For example, homo-phenylalanine, citrulline and noreieucme are considered 

35 amino acids for the purposes of the invention. "Amino acid" also includes imino acid residues such as 
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preferred embodiment, the amino acids are in the (S) 
chains are used, non-amino acid substituents may be 
degradations. 



PCT/US99/07374 

in either the (R) or the (S) configuration. In the 
or L-configuration. If non-naturally occurring side 
used, for example to prevent or retard in vivo 
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In general, peptides of the invention, including DPs and test peptides, comprise at least about 3 amino 
acids in length, usually from about 3 amino acids in length to about 100 amino acids, from about 3 amino 
acids in length to about 50 amino acids being preferred, from about 3 amino acids in length to about 10 
amino acids being more preferred, from about 4 amino acids in length to about 10 amino acids being 
mostly preferred and from about 5 amino acids in length to about 9 amino acids being especially 
preferred; peptides of 5, 6, 7, 8, 9, and 10 amino acids are preferred. Similarly, when larger test proteins 
are used, these may comprise at least about 3 amino acids in length, usually from about 3 amino acids in 
length to about 1000 amino acids, from about 3 amino acids in length to about 600 amino acids being 
preferred, from about 3 amino acids in length to about 400 amino acids being more preferred, from about 
3 amino acids in length to about 200 amino acids being mostly preferred and from about 3 amino acids in 
15 length to about 100 amino acids being especially preferred. 

The dimerization peptides (DP) of the invention comprise the sequence NH^-X^-X^-CO-OH and 
generally are no more than 9 amino acids long and wherein X„ X 2 , X 3 , and X 4 are generally selected from 
the group consisting of amino acids A, V, I, L, W. F, M and Y and X 5 is generally selected from the group 
consisting of K, R, D and E 

20 In a preferred embodiment, the dimerization peptides (DP) comprise the sequence NH 2 -FLIVK-COOH. 
As outlined above, other sequences include those with a high content of hydrophobic amino acids and 1 
or 2 charged amino acid residues. Generally, a sequence composed of 5. 6, 7 and 8 amino acids with at 
least 3-4 highly hydrophobic residues (taken from A, F, I, L, M. V, W, and Y) will function in this fashion. 

In a preferred embodiment the dimerization sequence is NH 2 -XFUVK-COOH, wherein X is either D, E, K, 
25 orR. 



In another preferred embodiment the dimerization sequence is NH 2 -FLIVKS-COOH. 

In a preferred embodiment the dimerization sequence is NH 2 -XFLIVKS-COOH. wherein X is either 
glutamic acid, aspartic acid, lysine or arginine. 

In another embodiment. DP-proteins comprise sequences comprising (Lys) 4 . e or (Arg)^ fused, as 
outlined in more detail below, to one terminus of a protein, and (Asp) 4 . 8 or (Glu)„ fused to the other 
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terminus of a protein. Such DP- proteins would be expected to form compact structures with the ends 
forming a 4-8 residue ion-paired extended array. 



Particularly preferred embodiments include, but are not limited to, the sequences EFLIVKS, KFLIVKS, 
EEFLIVKKS, EEFLIVKKS-acid, VSIKFEL, SKVILFE, AFLIVKS, EALIVKS, EFAIVKS, EFLAVKS, 
EFLIAKS, EFLIVAS, EFLIVKA, EFLKVKS, SKVILFE, EFL1VES, EKLKVKS, ESLSVKS, EFLIVES, 
VSIKFEL, LIVKS, FESIKVL and LKSIVEF. 

In a preferred embodiment, the DPs of the invention are covalently to a protein or peptide of interest, 
frequently referred to herein as "protein of interest", "peptide of interest", "test protein", or "test peptide", 
depending on its size. 

By "protein of interest", "peptide of interest", "test protein", or "test peptide" or grammatical equivalents 
herein is meant a protein for which generally a function is sought or which has certain characteristics to 
be tested. Generally, test proteins are encoded by nucleic acids which are obtained from genomic DNA, 
cDNA or from random nucleic acids. These nucleic acids are expressed (as detailed below) to generate 
the test proteins. Smaller test proteins, usually test peptides, can also be synthesized on a peptide 
synthesizer. Synthesis on a peptide synthesizer allows the incorporation of synthetic analogs including, 
but not limited to, unnatural amino acids or peptidomimetic bonds to enhance potency and stability of the 
test protein or test peptide. 

In a preferred embodiment, the test peptides are randomized. By "random" or "randomized" or 
grammatical equivalents herein is meant that each nucleic acid and peptide consists of essentially 
random nucleotides and random amino acids, respectively. Generally these random test peptides are 
expressed from a molecular library. In a preferred embodiment, the molecular library comprises at least 
two different randomized nucleic acid sequences, with a plurality of different randomized nucleic acid 
sequences being preferred. These nucleic acid sequences are chemically synthesized, and may 
incorporate any nucleotide at any position. The synthetic process can be designed to generate 
randomized nucleic acids, to allow the formation of all or most of the possible combinations over the 
length of the sequence, thus forming a library of randomized nucleic acids encoding randomized 
candidate proteinaceous molecules (e.g., randomized candidate DP-proteins). The randomized nucleic 
acid sequences such create a library of fragments, each encoding a different protein, which are ligated 
into suitable vectors and transformed into cells, as outlined herein. 

In one embodiment, the library is fully randomized, with no sequence preferences or constants at any 
position. 
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I. another preferred embodiment me libra,, is biased. The, is. some positions w«n the sepuence are ' 
either held constant, or are selected from a iimifed number of possibles. Por example, in a preferred 
embodiment tnplets o. nuofeobdes (NNN) are randomized to enoode an.no acid rosidues wi,h,n a denned 
class, for example, hydrophobic amino acids, hydrophitic residues, stehcally biased (either smal, or targe) 
residues, towards me creation of cysteines, for cross-linking, prolines fc, SH-3 domains serines 
threonines, tyrosines or histidines for phosphorylation sites, etc . or to purines, etc 

The term "random peptide librazy or -random protein library- is mean, herein as to comprise recombinant 
vectors encoding random peptides (or random proteins,, me random pep*., (or random proteins) 
encoded by those recombinant vectors, recombinant vectors encoding fusion proteins, composing 
random pep,ides (or random proteins,, and the fusion proteins, comprising random peptides (random 
proteins), encoded by those recombinant vectors. 

in a preferred embodiment, the sepuence o, the candidate DP-protein is used to generate derivatives o, 
he original,, Isolated candidate DP-pro,e,n Pot example, the sepuence of the candidate DP.pro,e,n may 
be me bas,s of a second round of (biased) randomization, to generate derivative DP-pro.eins with 
™, or altered activities. Alternative*. ,„ e second round of ra „domi Z a,ion may change the affinity 
o, the bioactive agent Furthermore, it may be desirable ,0 operably link me protein component o, the 
identified DP-protein to different dimer*a t ion sequences than those used ,o isolate the original candidate 
DP-protem. Th,s may result in a fusion protein ,ha, is more or less constrained and thus may ha,e altered 
activities. „ may also be desirabie to W around a potenfia, binding site, in a manner similar to the 
mutagenesis o, a binding pocket by keeping one end of fine figand region cdnstant and randomizing the 
other end Id shift the binding of the peptide around. 



•n a preferred embodiment, trie tes, protein comprises a wi,d-„pe or riaturally occurring sepuenoe 
Alternatively, i, ma, be a derivative protein thereof, that is, I, may contain amino acid substitutions 
insertions or deletions, or combinations thereof which are no, found in the original,, isolated DP-protein 
25 These modincafions are routiriel, performed b, ,„ v„ro mutagenesis o, the nucleic acid encoding the 
pro,e,n of interest t, wtro mutagenesis methods are well Known to those in the arf and are found in ror 
example. Sambrook e, a,., Mofecotar Coning: A Laboratory Manua, (New York: Cold Spring Harbor ' 
taborator, Press, ,989, arid Ausuba, e, at, Sriorf Protoco* ,„ Mofecutan ftotog, (J oh„ Wile, » Soris 
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The DPS o, the invenfion are covalently joined ,„ trie tes, protein B, -povalentiy aftaohed" or 'covalently 
,o,ned or grammatical equivalents herein is meant thattwa moieties are attached by at ,eas, one bond 
including sigma bonds. Pi bonds, and coordination bonds. As is more ,„,„ outlined below, the DPs o, the 
invention are covalently ,oined to fusion partners and/or tea. peptides. Covalen, attachment to a fusion 
partners and tes, peptides is accomplished by emplo„„g cysteine (disulfide) linkage, peptide bond 



9 



WO 99/51625 PCT/US99/07374 
linkage, a variety of afunctional agents (cross-linking agents, such as maleimidobenzoic acid, 
methyidithioacetic acid, mercaptobenzoic acid, S-pyridyl dithiopropionate. etc.), or attachment via 
nonpeptide bonds Examples of nonpeptide bonds include, but are not limited to, retroinverso bonds, N- 
methyl amine bonds, depspeptide bonds, hydroxyamino peptide isoteres, thioamide bonds, peptoids 
5 [Simon et al., Proc. Natl. Acad. Sci. USA 89:9367-71 (1 992)], double bonds, reduced peptide bonds, 
ethylene bonds, keto peptide bond analogs, methylene sulfoxides, and methylene sulfides [Rizo and 
Gierasch, Annu. Rev. Biochem. 61:387-418 (1992)]. 

In general, as detailed below, the DPs are joined to peptides or proteins using peptide bonds, for example 
by expressing nucleic acids that encode the DP and the respective peptide or protein of interest. 

10 In a preferred embodiment, the DPs of the invention are joined to a test protein to form fusion proteins, in 
a wide variety of ways, as will be appreciated by those in the art. As is more fully described below, they 
can be joined to one or more internal positions, or preferably to either or both of the N- and C-terminal 
terminus. The attachment of DP to a fusion partner results in a structure referred to herein as DP-protein. 

By "DP-protein" herein is meant a compound comprising at least one dimerization peptide covalently 
1 5 joined to at least one peptide. DP-proteins include candidate DP-proteins. as defined below. As will be 
appreciated by those in the art, when a single DP is used, the compositions and methods of the invention 
find use in the association of two test peptides. That is, a first DP (DP,) can be joined to a first test 
protein (protein,), and a second DP (DP 2 ) can be joined to a second test protein (protein 2 ). When two 
DPs are used, the compositions find use in the generation of constrained test peptides. 

20 In a preferred embodiment, at least one DP is joined to the N-terminus of a test protein, with the 

attachment of two DPs being preferred. In this embodiment, when two or more DPs are joined to the test 
protein, the DPs may be identical in sequence or may have a different sequence. The DPs may or may 
not be separated by a linker sequence as further outlined below. In an embodiment, wherein the same 
DP or two different DPs with affinity for one another are joined to the N-termini of two different test 

25 proteins, protein, and protein,, generating, for example, DP-protein, and DP-protein 2 , the two DPs 

associate with one another and protein, and protein, are brought into proximity. Due to the presence of 
the same DP sequence, in addition to protein,: protein, heterodimers, protein,: protein, homodimers and 
protein 2 :protein 2 homodimers can be made. 

In a preferred embodiment, at least one DP is joined to the C-terminus of a test protein, with the 
30 attachment of two DPs being preferred As above, the DPs may be identical in sequence or may have a 
different sequence. The DPs may or may not be separated by a linker sequence as further outlined 
below. In an embodiment, wherein the same DP or two different DPs with affinity for one another are 
joined to the C-termini of two different test proteins, protein, and protein,, generating, for example. 
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proteirvDP and protein 2 -OP. the two DPs associate with one another and protein, and protein, are 
brought into proximity. Due to the presence of the same DP sequence, in addition to protein, protein, 
heterod,mers. protein, :protein, homodimers and protein,protein 2 homodimers are formed. 

in a preferred embodiment, at .east one DP is joined to an interna, position of a test protein with 
attachment of two DPs being preferred. As above, the DPs may be identical in sequence or may have a 
efferent sequence. The DPs may or may not be separated by a .inker sequence as further outlined 
below. When two or more DPs are Joined to an interna, position, the DPs may be juxtaposed that is 
inserted ,nto the same internal position, for example, generating N protein r DP,-DP 2 - lP rotein c or the DPs 
may be separated and JO ined to different interna, positions, for examp.e, generating N protein r DP n p r otein | . 
DP 2 -,prote,n c , wherein 'N' is the amino-termina. part of the test protein, 'C is the carboxy-termina. part of 
the test protein. T is an interna, part of the protein, flanked by the dimerization pept,des DP, and DP In 
an embodiment, wherein DP, and DP 2 are of identica. sequence or have an affinity for one another they 
assocate and the part of the test protein enc.osed by DP, and DP 2 (i.e.. .protein,) forms a loop structure. 

In a preferred embodiment, the linkage of the DP to the test protein is direct; that is, there is a direct 
1 5 fusion of the DP sequence with the test protein sequence. 

in a preferred embodiment, the .inkage of the DP to the test protein is indirect; that is a linker or spacer is 
used. The term "linker, or "spacer', or "tethering sequence" or grammatical equivalents is meant herein 
to compose a mo.ecu.e or a group of mo.ecu.es that connects two mo.ecu.es. Often the inclusion of a 
.inker serves to p.ace the two mo.ecu.es in a preferred configuration, for example, imposing a more 
20 construed configurat.cn on two molecules (such when linkers compns.ng prolines are used) or imposing 
a more relaxed configuration on two molecules ( that is, minimal steric hindrance, such when linkers 
comprising serines and glycines are used). 



25 
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In a preferred embod.ment. a linker sequence is included at any pos,tion. i.e., in between DP and the 
proton of interest, in between two unrelated DPs. or in between two fus.on partners. As outlined herein 
the linker sequence can be proteinaceous or non-proteinaceous. Linker sequences between ,nd,vidua. ' 
components of the compound may be desirable, for example, to allow the protein of interest to interact 
with potential targets unhindered, to constrain the protein of interest, or to allow functioning of a new 
property conferred upon the protein of interest (e.g.. subce.lu.ar loca.izat.on). For constraining a protein 
of -merest, proline-containing .inkers are particularly preferred. As is known in the art prol.nes confer 
un.que conformational consents on a po.ypeptide chain. Useful proline linkers include proline-glycine 
polymers (including, but not limited to. (PG) n , (PPGG) n , (PP) n , and combinations thereof wherein n is an 
integer of at least one). Preferred linkers allowing some flexibility of the polypeptide include glycine-serine 
polymers (inc.ud.ng, but not .imited to. (GS) n , (GSGGS) n and (GGGS)„, and comb.nations thereof where.n 
n is an mteger of at least one), glycine-a.an.ne polymers, alanine-serine polymers, and other flexible 
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linkers such as the tether for the shaker potassium channel, and a large variety of other flexible linkers, as 
will be appreciated by those in the art. Glycine-serine polymers are particularly preferred. 

In a preferred embodiment, the DP-protein comprises two DPs. In this embodiment, the two DPs are 
used to conformationally constrict the test protein. DPs, when covalently joined at the N- and C-terminus 
5 of a protein of interest (ranging from 3 to 50 or more amino acid residues), help the protein of interest to 
fold into a compact structure (also referred herein to as a constrained structure) which is more 
proteolytically resistant than the linear protein sequence alone. Particularly preferred in this embodiment 
and, when screening for interacting molecules, are random test proteins. 

In a preferred embodiment, a first DP (DP,) is fused to the N-terminus (N) of a test protein, and a second 
10 DP (DP 2 ) is fused to the C-terminus (C) of a test protein (protein), generating, for example, DP,- N protein c - 
DP 2 . In this embodiment, the first and second DP can be the same or different. When two DPs are used 
that can self-aggregate, the two DPs associate and impose a constrained structure upon the test protein 
enclosed in between the two DPs. When two different DPs (DP, and DP 2 ) are joined to the N-terminus 
and to the C-terminus of a test protein, the two different DPs nevertheless can associate and impose a 
15 constrained structure upon the test protein, provided, that DP, and DP 2 have an affinity for one another. 
Different DP sequences that can associate are, for example KFLIVKS and EFLIVES. 



Particularly preferred examples of DP-proteins, include, but are not limited to: (i) EFLIVKS-protein- 
EFLIVKS; (ii) KVLIKS-protein-EFLIVES; (iii) VSIKFEL-protein-VSIKFEL; (iv) LIVKS-prot -LIVKS; 
EFLIVK-protein-EFLIVK; (vi) FESIKVL-protein-FESIKVL; and (vii) LKSIVEF-protein-LKSIVEF. 



(v) 



20 More specifically, DP.-protein-DP, like compounds provided by this invention comprise (i) EFLIKS - 
VGTIVTMEYRIDRTRSFV-EFUFKS, wherein the protein sequence is obtained from the barley c2- 
chymotrypsin .nhibitor [VGTIVTMEYRIDRTRSFV; Leatherbarrow and Salacinski. Biochemistry 30:10717- 
21 (1991)) and DP, and DP 2 are identical; (ii) EFJJKS-VGTIVTMEYRIDRTRSFV-SKVILFE, wherein the 
sequence of DP 2 is the reverse sequence of DP,; (iii) SKVIL^-VGTIVTMEYRIDRTRSFV-E£LjyKS, 
25 wherein the sequence of DP, is the reverse of DP 2 ; (iv) SKVILFE-VGTIVTMEYRIDRTRSFV-SKyiLFE, 
wherein both DP, and DP 2 are identical, however, the reverse of DP, and DP 2 shown in (i); (v) KFLIVKS - 
VGTIVTMEYRIDRTRSFV-KFUVKS, wherein DP, and DP 2 are identical; (vi) KFLIVKS - 
VGTIVTMEYRIDRTRSFV-ERJVES, wherein DP., and DP, are different, (vii) EFLIVES- 
VGTIVTMEYRIDRTRSFV-EFyVES, wherein DP, and DP, are identical; (iix) EKLKVKS- 
30 VGTIVTMEYRIDRTRSFV-EJKLKVKS, wherein DP, and DP 2 are identical, (ix) ESLSVKS - 
VGTIVTMEYRIDRTRSFV-ESLSVKS, wherein DP, and DP 2 are identical; (x) EFLKVKS - 
VGTIVTMEYRIDRTRSFV-ERJCVKS, wherein DP, and DP 2 are identical; (xi) EEFLIVKKS- 
VGTIVTMEYRIDRTRSFV-E^fUVKKS, wherein DP, and DP, are identical; (xii) MG EFLIVKS - 
VGTIVTMEYRIDRTRSFV-E£UVKSGPP, wherein DP, and DP 2 are identical and DP, comprises amino 
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acids MG and DP 2 comprises amino acids GPP for conferring increased stability (xiii) 
KKKKKKGGGGEFJjyj^-VGTIVTMEYRIDRTRSFV-EFJJVKS, wherein DP, and DP 2 are identical and 
DP, comprises amino acids KKKKKKGGGG for conferring increased solubility; (xiv) KKKGSGSEFLIVKS- 

VGTIVTMEYRIDRTRSFV -EFLIVKS , wherein DP, and DP 2 are identical and DP mm T 

c LfLr^c- c 2 loenilcal and DP, comprises amino acids 

KKKGSGS for conferring increased solubility; (xv) SFUYKS-STKS.PPQS-EFL.V^ wherein the 9-mer 
-nsert represents an analog of a protease inhibitor [Garian, and Leatherbarrow. J. Peptide Res 49 467-75 
(1997)]; (xvi) MGEFUy^-GGGGDYKDDDDKGGGG-EZUVKSGPP. wherein DP, and DP 2 are identical 
and DP, comprises amino acids MG and DP 2 comprises amino acids GPP for conferring increased 
stability and the protein comprises the flag epitope (DYKDDDDK) with glycine spacers (xvii) 
10 MGE^-GGGGYPYDVPDYASLGGGG-EFUV^GPP, wherein DP, and DP 2 are 'identical and DP 
comprises amino acids MG and DP 2 comprises amino acids GPP for conferring increased stabi.ity and 
the protein comprises the influenza hemagglutinin epitope tag (YPYDVPDYASL) with glycine spacers 
The dimerization sequence is underlined in all the above examples. 

6mb0diment ' 3 first DP <DPi> * J*- to the N-terminus of the test protein and a second 
DP (DP 2 ) ,s joined to an internal position of the test protein. A structure such as DP,- N prote.n -DP 
,prote.n c ,s generated. In an embodiment, wherein DP, and DP 2 are of identical sequence or have an 
affinity for one another, they associate and the part of the test protein enclosed by DP, and DP (i e 
N protein,) forms a loop. 2 



*1 nZl emb ° diment ' 3 ** ° P (DP,) 15 J0 ' ned 10 ^ C ' terminUS ° f the test ^ - a second 
DP (DP 2 ) is joined to an internal position of the test protein. A structure such as .protein.-DP^proteine- 

DP, is generated. In an embodiment, wherein DP, and DP 2 are of identical sequence or have an affinity 
for one another, they associate and the part of the test protein enclosed by DP, and DP. (i e ,protein c ) 
forms a loop. " c 

in a preferred embodiment, both the first DP (DP,) and the second DP (DP 2 ) are joined to an interna, 
position of the test protein or preferably to two different interna, positions of the test protein, generating a 
structure such as N protein ( -DP n protein,-DP 2 - iP rotein c . In an embodiment, wherein DP, and DP are of 
■dentical sequence or have an affinity for one another, they associate and the part of the test protein 
enclosed by DP, and DP 2 (i.e., .protein,) forms a loop. 



30 



will 



in a preferred embodiment, different dimerization peptides are fused to more than one protein which „ 
be covalently associated with one another. In this embodiment, the individual dimerization peptides may 
also be separated by linkers inserted in between DP and a protein and/or in between individual DPs For 
example, a DP fusion protein such as DP^Lp-protein.Lp-DP^L^DP^-L.-prote.n^^p^ (see 
F-gure 2A). wherein DP hyd is a DP comprising mostly hydrophobic amino acid residues DP, is a DP 
comprising mostly lysine residues, DP Glu is a DP comprising mostly glutamic acid res.dues ll is a linker 
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comprising proline residues. L G is a linker comprising glycine residues, and protein, and protein 2 are 
proteins which comprise different protein sequences, can be made. The above illustrated bivalent DP 
fusion protein will allow two constrained proteins covalently associated with one another within a single 
fusion protein, forming a 'double-loop' structure. Within such a structure, the first loop (comprising 
5 protein, ) is formed by the dimerization of the first DP hyd with the second DP hyd and the second loop 
(comprising protein 2 ) is formed by the dimerization of DP Lys and DP Glu . The two loop structures may be 
separated by a flexible linker such as a glycine or serine/glycine linker as outlined above. 

In a preferred embodiment, different dimerization peptides are fused to more than one protein which then 
non-covalently associate with one another. In this embodiment, the individual dimerization peptides may 
10 also be separated by linkers inserted in between DP and a protein and/or in between individual DPs. For 
example, the following DP fusion proteins can be made: (i) DP hyd -L P -protein,-L P -DP hyd -L G -DP Lys and (n) 
DP hyd -L P -protein 2 -L P -DP hyd -L G -DP GIU (see Figure 2B) wherein DP hyd is a DP comprising mostly 
hydrophobic amino acid residues, DP Lys is a DP comprising mostly lysine residues, DP GkJ is a DP 
comprising mostly glutamic acid residues, L P is a linker comprising proline residues, L G is a linker 
15 comprising glycine residues, and protein, and protein 2 are proteins which comprise different protein 
sequences. In the above illustration, two individual proteins (protein, and protein 2 ) are each held in a 
compact structure, due to the association of the respective DPs. Upon mixing the two DP-fusion proteins, 
they form non-covalently associated dimers, due to the specific association of DP Lys with DP GIU , resulting 
in a dimer structure which comprises two different compact proteins (protein, and protein 2 ). In another 
20 embodiment the protein sequences inserted in between the two DP hyd s are identical: (i) DP hyd -L P -protein,- 
Lp-DP rtyd -LG-DP Lys and (ii) DP hyd -L P -protein,-L P -DP hyd -L G -DP Glu , resulting in a non-covalent double loop 
structure comprising two juxtaposed compact structures of the same protein. It will be obvious to those in 
the art that a plurality of DP fusion proteins other than those illustrated herein, can be made. 

In a preferred embodiment, different dimerization peptides" are fused to more than one protein which non- 
25 covalently associate with one another. In this embodiment, DP-proteins are generated, wherein the DPs 
are used to non-covalently associate two or more unconstrained proteins to form constrained structures 
(see Figure 2C). In this embodiment, the individual dimerization peptides may also be separated 
conveniently by linkers inserted in between DP and a protein. For example, the following DP fusion 
proteins can be made: (i) DP nyd -L P -protein,-L P -DP Lys and (ii) DP hyd -L P -protein 2 -L P -DP GIU , wherein DP hyd is 

30 a DP comprising mostly hydrophobic amino acid residues, DP Lys is a DP comprising mostly lysine 
residues, DP Glu is a DP comprising mostly glutamic acid residues, L P is a linker comprising proline 
residues, L G is a linker comprising glycine residues, and protein, and protein 2 are proteins which 
comprise different protein sequences. In the above illustration, two individual proteins (protein, and 
protein 2 ) are each held in a compact structure, due to the association of the respective DPs. Upon mixing 

35 the two DP-fusion proteins, they form non-covalently associated dimers, due to the specific association of 
DP Lys with DP Glu , resulting in a dimer structure which comprises two different compact proteins (protein, 
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and protein,). In another embodiment the protein sequences inserted in between the two DP h d s are 
•dentica,: (i) DP hyd -L P -protein 1 -L P -DP hyd -L G -DP Lys and (ii) D^-L.-protein^Lp-DP^-DP^^sulting in 
a non-covatent double loop structure comprising two juxtaposed compact structures of the same protein 
It w.l be obvious to those in the art that a pl ura.ity of DP fusion proteins other than those illustrated herein 
can be made. 

Other dimerizing protein sequences are known in the art or may be isolated using known screening 
systems, such as the yeast two-hybrid system. 

In one embodiment, each of the two prote.n sequences (protein, and protein,), for example, within the 
above DP hyd -L P -protein r L P -DP hyd -L G -DP Lys -L P -protein 2 -L P -DP Glu . has a specific bioactivity, which when 
combined in a structure as outlined above, results in a bivalent DP-fusion protein which has a greater 
b,oactivity than each alone. For example, both compact structures may bind to the same target protein 
however with low affinity. Combining both compact structures into a single bivalent DP-fusion protein as 
outhned above, may result in much higher affinity for the target protein, and thus the single DP-fusion 
protein may be a more potent agonist or antagonist than each isolated DP-protein. 



20 



15 In another preferred embodiment, DP-fusion prote.n structures as outlined above, such as DP hyd -L P - 
protein 1 -L P -DP hyd -L G -DP Lys -L P -protein 2 -L P -DP GIU which have a bivalent binding specificity are also useful 
for associating two proteins for which they have affinity. In this embodiment, the compact structure 
comprising protein, has affinity to a protein X; and the compact structure comprising protein, has affinity 
to protein Y. Introducing this DP-fusion protein into a cel. wh.ch expresses both protein X and protein Y 
results in binding of the bivalent DP-fusion protein to both prote.n X and prote.n Y, which thereby are 
brought into close proximity 

Similarly, DP-fusion prote,n structures as outlined above, such as DP nyd -L P -protein,-L P -DP hyd -L G -DP L - 
L P -prote,n 2 -L P -DP Glu wh.ch have a b.valent bind.ng specificity are also useful for associating two cells" The 
cells may be identical or different. In this embodiment, the compact structure comprising protein, has 
affimty to a ce.l surface component X displayed on a first cel.. The compact structure compris.ng prote.n. 
has affinity to a cell surface component Y displayed on a second cell. Co-culturing the first and second ' 
cells and providing this bivalent DP-fusion protein, results in binding of the DP-fus,on protein to both eel, 
surface component X and ceil surface component Y, wh.ch will force first cell and second cel. into Cose 
proximity. 

30 Among the most challenging aspects in gene therapy is the delivery of the gene of interest into a specific 
target ce... where.n a genetic defect is sought to be corrected. Several gene de.ivery systems are known 
to those ,n the art, including, but not limited to naked DNA, liposome-embedded DNA, and viral systems 
compns.ng retroviruses, adenoviruses, herpesviruses. HIV, etc. However, whatever system is employed. 

15 
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cell-type specific delivery remains the most critical aspect of gene therapy. In a preferred embodiment, 
DP-fusion protein structures as outlined above, such as DP hyd -L P -protein 1 -L P -DP hyd -L G -DP Lys -L P -protein 2 - 
L P -DP GIU which have a bivalent binding specificity are also useful as tools for associating virus particles 
(e.g., a virus that delivers a gene of interest) with the desired target cells, in this embodiment, the 

5 compact structure, comprising protein, has affinity to a cell surface component X displayed on a the virus 
and the compact structure comprising protein 2 has affinity to a cell surface component Y displayed on a 
target cell. Co-culturing the virus and the target cells and providing this bivalent DP-fusion protein, results 
in binding of the bivalent DP-fusion protein to both viral surface component X and target cell surface 
component Y, which will force the virus into close proximity with its target cell. The viral particle thus may 

10 dock to the desired target cell and fuse with the membrane ensuring gene delivery. Suitable controls are 
performed such that the virus does not dock with its target cell without addition of the bivalent DP-fusion 
protein. 

In another embodiment the protein sequences inserted in between the two DP hyd s and in between DP Lys 
and DP GKl are identical, resulting in a double loop structure comprising two juxtaposed compact structures 
1 5 of the same protein. This embodiment allows the dimerization of the same protein, which may be a 

cellular protein or an extracellular protein component. It will be obvious to those in the art that a plurality 
of DP fusion proteins other than those illustrated herein, can be made. 

The DPS or DP-proteins of the present invention may also be modified, as more fully outlined below, to 
form fusion proteins comprising a DP or a DP-protein and another, heterologous protein or amino acid 
20 sequence, usually referred to as a fusion partner . 

The term 'fusion protein" or "chimeric protein" refers to a protein composed of at least two proteins that, 
while typically unjoined in their native state, typically are joined by their respective amino and carboxyl 
termini-through a peptide linkage to form a single continuous protein. It will be appreciated that the 
protein components can be directly joined or joined through a peptide linker/spacer. 

25 By "fusion partner" herein is meant a sequence that is associated with DP or DP-protein and confers upon 
DP or DP-protein an additional function or ability. Suitable fusion partners include, but are not limited to: 
a) tag sequences (also referred to as rescue sequences), as defined below, which allow the purification or 
isolation of either the DP or DP-protein or the nucleic acids encoding them; b) targeting sequences, 
defined below, which allow the localization of DP or DP-protein to a subcellular or extracellular 

30 compartment; c) stability sequences, which confer stability or protection from degradation to DP or DP- 
protein, for example resistance to proteolytic degradation; or d) any combination of a), b), and c), as well 
as linker sequences as needed. It is well known to those in the art that fusion proteins preferably are 
generated by in vitro mutagenesis and genetic engineering, whereby the nucleic acid encoding the 
respective fusion protein is modified accordingly. Suitable methods can be found, for example, in 
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Sambrook et al., Molecular Cloning: A Laboratory Manual (New York: Cold Spring Harbor Laboratory 
Press, 1989) and Ausubel et al.. Short Protocols in Molecular Biology (John Wiley & Sons, Inc., 1995). 

In a preferred embodiment, the fusion partner comprises a tag polypeptide which provides an epitope to 
which an anti-tag antibody can selectively bind or an epitope comprising a purification sequence. The 
epitope tag is generally, but not required to be, placed at the amino-or carboxyl-terminus of DP or DP- 
protein. The presence of such epitope-tagged forms of DP or DP-protein can be detected using an 
antibody against the tag polypeptide. Also, the use of the tag enables the protein to be readily purified by 
affinity purification using an anti-tag antibody or another type of affinity matrix that binds to the epitope 
tag. In an alternative embodiment, the chimeric molecule may comprise a fusion of DP or DP-protein 
with an immunoglobulin or a particular region of an immunoglobulin. For a bivalent form of the chimeric 
molecule, such a fusion could be to the Fc region of an IgG molecule or to GST (glutathione S 
transferase). 



Various tag polypeptides and their respective antibodies are well known in the art. Examples .nclude 
poly-histidine (poly-his) or poly-histidine-glycine (poly-his-gly) tags; the flu HA tag polypeptide and its 
antibody 12CA5 [Field et al.. Mol. Cell. Bio.., 8:2159-2165 (1988)]; the c-myc tag and the 8F9, 3C7 6E10 
G4. B7 and 9E10 antibodies thereto [Evan et al., Mol. Cell. Biol., 5.3610-3616 (1985)]; and the Herpes 
Simplex virus glycoprotein D (gD) tag and its antibody [Paborsky et al., Protein Eng., 3(6):547-553 
(1990)]. Other tag polypeptides include the Flag-peptide [Hopp et al., Mol. Immunol., 33:601-8 (1996)- 
Brizzard et al., Biotechniques 16(4):730-735 (1994); Knappik and Pluckthun, Biotechniques 17(4):754-61 
(1994), the KT3 epitope peptide [Martin et al., Science, 255:192-194 (1992)], the tubulin epitope peptide 
[Skinner et al.. J. Biol. Chem., 266:14163-14166 (1991)], and the T7 gene 10 protein peptide tag [Lutz- 
Freyermuth etal., Proc. Natl. Acad. Sci. USA, 87:6393-6397 (1990)]. Alternatively, for example, fusion 
proteins encompassing poly-his tags are efficiently purified on metal (Ni) affinity resins. 

In a preferred embodiment, a tag sequence (also referred to as a rescue sequence) is used to isolate the 
nucle-c acid encoding the DP-protein (see also below). In this embod.ment the rescue sequence may be 
a umque oligonucleotide sequence which serves as a probe target site to allow quick and easy isolation of 
the nucleic acid construct (see below), via PCR, hybridization, or related techniques. 

In a preferred embodiment, the fusion partner is a targeting sequence. As will be appreciated by those in 
the art. the localization of proteins within a cell is a simple method for increasing effective concentration 
and determining function. These mechanisms are thought to rely on the principle of limiting the search 
space for ligands, that is to say, the localization of a protein to the plasma membrane limits the search for 
its hgand to that limited dimensional space near the membrane as opposed to the three dimensional 
space of the cytoplasm. Alternatively, the concentration of a protein can also be s.mply increased by 
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nature of the localization, for example, shuttling the proteins into the nucleus confines them to a smaller 
space thereby increasing concentration. 



Thus, suitable targeting sequences include, but are not limited to, (i) sequences capable of causing 
binding of the respective protein to a predetermined molecule or class of molecules while retaining 
bioactivity of the expression product, (for example by using enzyme inhibitor or substrate sequences to 
target a class of relevant enzymes); (ii) sequences signaling selective degradation, of itself or co-bound 
proteins; and (iii) signal sequences capable of constitutively localizing the candidate expression products 
to a predetermined cellular locale, including (a) subcellular locations such as the Goigi apparatus, 
endoplasmic reticulum, nucleus, nucleoli, nuclear membrane, mitochondria, chloroplast, secretory 
vesicles, lysosome, and cellular membrane; and (b) extracellular locations via a secretory signal [see, von 
Heijne, EXS 73:67-76 (1995); von Heijne, Subcell. Biochem. 22:1-19 (1994) and von Heijne, Curr. Opin. 
Cell. Biol. 2(4):604-8 (1990)]. Particularly preferred is localization to either subcellular locations or to the 
outside of the ceil via secretion. 

In a preferred embodiment, the fusion partner is a nuclear localization signal (NLS). NLSs are generally 
15 short, positively charged (basic) domains that serve to direct the entire protein in which they occur to the 
cell's nucleus. Numerous NLS amino acid sequences have been reported including: (i) single basic 
NLS's such as that of the SV40 (monkey virus) large T Antigen [Pro Lys Lys Lys Arg Lys Val; Kalderon et 
aL Cell 39:499-509 (1984)]; the human retinoic acid receptor-fi nuclear localization signal (ARRRRP, 
Hamy et aL, Bioconjug. Chem. 2(5):375-8 (1991); NFkB p50 [EEVQRKRQKL; Ghosh et aL, Cell 62:1019- 
20 1029(1990)]; NFkB p65 [EEKRKRTYE; Nolan etaL. Cell 64:961-969 (1991)]; and others [see for 

example Boulikas. J. Cell. Biochem. 55(1):32-58 (1994)], hereby incorporated by reference and (ii) double 
basic NLS's exemplified by that of the Xenopus laevis (African clawed toad) protein, nucleoplasms [Ala 
Val Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gin Ala Lys Lys Lys Lys Leu Asp; Dingwall et aL, Cell 
30:449-458 (1982) and Dingwall et aL, J. Cell Biol. 107:841-849 (1988)]. Numerous localization studies 
25 have demonstrated that NLSs incorporated in synthetic peptides or grafted onto reporter proteins not 

normally targeted to the cell nucleus cause these peptides and reporter proteins to be concentrated in the 
nucleus. See, for example, Dingwall and Laskey, Annu. Rev. Cell. Biol., 2:367-390 (1986); Bonnerot et 
aL, Proc. Natl. Acad. Sci. USA, 84:6795-6799 (1987) and Galileo et aL, Proc. NatL Acad. Sci. USA, 
87:458-462(1990). 

30 in a preferred embodiment, the fusion partner is a membrane anchoring signal sequence. This is 

particularly useful since many parasites and pathogens bind to the membrane, in addition to the fact that 
many intracellular events originate at the plasma membrane. Thus, membrane-bound DP-proteins are 
useful for both the identification of important elements in these processes as well as for the discovery of 
effective inhibitors or activators. The invention provides methods for presenting the DP protein 

35 extracellularly or in the cytoplasmic space. For extracellular presentation, a membrane anchoring region 



18 



10 



20 



25 



30 



WO 99/51625 

PCI7US99/07374 

is provided at the carboxyl terminus of the DP-protein. The DP-protein is exposed on the cell surface and 
presented to the extracellular space, such that it can bind to other surface molecules (affecting their 
function) or molecules present in the extracellular medium. The binding of such molecules could confer 
function on the cells expressing a DP-protein that binds the molecule The cytoplasmic region could be 
neutral or could contain a domain that, when the extracellular DP-protein is bound by a target protein or 
test protein, confers a function on the cells (activation of a kinase, phosphatase, binding of other cellular 
components to effect function). Similarly, the DP-protein-containing region could be contained within a 
cytoplasmic region, and the transmembrane region and extracellular region remain constant or have a 
defined function. 



Membrane-anchoring sequences are well known in the art and are based on the genetic geometry of 
mammalian transmembrane molecules. Peptides are inserted into the membrane based on a secretory 
signal sequence and require a hydrophobic transmembrane domain. Of course, if a transmembrane 
domain is placed amino-terminal to the DP-protein region, it will serve to anchor the DP-protein as an 
intracellular domain, which may be desirable in some embodiments. Secretory signal sequences and 
15 transmembrane domains are known for a wide variety of membrane bound proteins, and these 

sequences may be used accordingly, either as pairs from a particular protein or with each component 
being taken from a different protein, or alternatively, the sequences may be synthetic, and derived entirely 
from consensus as artificial delivery domains. 



As will be appreciated by those in the art, membrane-anchored protein sequences, including both SS and 
TM, are known for a wide variety of proteins and any of these may be used. Particularly preferred 
membrane-anchoring sequences include, but are not limited to, those derrved from CDS, ICAM-2. IL-8R, 



CD4 and LFA-1 . 



35 



Useful sequences mclude sequences from: (i) class I integral membrane proteins such as IL-2 receptor 
beta-cha,n [residues 1-26 are the signal sequence, residues 241-265 are the transmembrane residues- 
see Hatakeyama et al., Science 244.551-556 (1989) and von Heijne and Gavel. Eur. J. Biochem. 174:671- 
678 (1988)] and insulin receptor beta chain [residues 1-27 are the signal sequence, residues 957-959 are 
the transmembrane doma.n and residues 960-1 382 are the cytoplasmic domain; see Hatakeyama supra 
and Ebina et al., Cell 40:747-758 (1985)], (ii) class II integral membrane proteins such as neutral 
endopeptidase (residues 29-51 are the transmembrane domain, res.dues 2-28 are the cytoplasmic 
domain; see Malfroy et al., Biochem. Biophys. Res. Commun. 144:59-66 (1987)]; (Hi) type III proteins 
such as human cytochrome P450 NF25 (Hatakeyama, supra), and (iv) type IV proteins such as human P- 
glycoprotein (Hatakeyama. supra). Particularly preferred are CDS and ICAM-2. For example the signal 
sequences from CDS and ICAM-2 lie at the extreme 5' end of the transcript. These sequences encode 
the amino acids 1-32 in the case of CDS [MASPLTRFLSLNLLLLGESILGSGEAKPQAP; Nakauchi et al 
Proc. Natl. Acad. Sci. USA 82:5126-30 (1985)] and 1-21 in the case of ICAM-2 
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[MSSFGYRTLTVALFTL1CCPG; Staunton et al., Nature 339:61-64 (1989)]. These leader sequences 
deliver the construct to the membrane while the hydrophobic transmembrane domains, placed carboxy- 
terminal to the DP-protein region, serve to anchor the construct in the membrane. These transmembrane 
domains are encompassed by amino acids 145-195 from CD8 
5 (PQRPEDCRPRGSVKGTGLDFACDIYIWAPLAGICVALLLSLIITLICYHSR; Nakauchi, supra) and 224-256 
from ICAM-2 (MVIIVTWSVLLSLFVTSVLLCFIFGQHLRQQR; Staunton, supra). 

Alternatively, membrane anchoring sequences include the GPI anchor, which results in a covalent bond 
between the molecule and the lipid biiayer via a glycosyl-phosphatidylinositol bond for example in DAF 
[PNKGSGTTSGTTRLLSGHTCFTLTGLLGTLVTMGLLT, with the bolded serine being the site of the 
10 anchor; see Homans et al.. Nature 333(61 70):269-72 (1988), and Moran et al., J. Biol. Chem. 266.1250- 
1257 (1991)]. In order to do this, the GPI sequence from Thy-1 can be inserted 3' of the variable region in 
place of a transmembrane sequence. 

It is within the scope of this invention to display the DP-protein on membranes of viral, archaebacterial, 
prokaryotic and eukaryotic origin. In this embodiment, the DP-protein is fused to a membrane protein 
15 such that after insertion into the membrane, the DP-protein region will be located on the outside of the 
virus, archaebacteria, prokaryote or eukaryotic cell and thus be accessible for binding target molecules, 
e.g., when screening for binding target molecules. Prokaryotic surface display systems, include, for 
example, functional fusions to surface proteins such as flagellin [Lu et al., Biotechnology 13(4):366-72 
(1995) and ice-nucieation protein [Jung et al., Nat. Biotechnol. 16(6):576-80 (1998)]. Other prokaryotic 
20 protein display systems are reviewed by Stahl and Uhlen, Trends Biotechnol. 15(5): 185-92 (1997) and 
Georgiou et al., Nat. Biotechnol. 15(1):29-34 (1997). Viral display system include, but are not limited to, 
(i) filamentous bacteriophages such as M13 and derivatives [for review see Feiici et al., Biotechnol. Annu. 
Rev. 1:149-83 (1995)]; (ii) bacteriophage T4 [Jiang eta!., Infect. Immun. 65(11):4770-7 (1997)]; (iii) 
bacteriophage lambda [Stoiz et al., FEBS Lett. 440(1-2):213-7 (1998)]; (iv) tomato bushy stunt virus 
25 [Joelson et aL J. Gen. Virol. 78(Pt 6):1213-7 (1997)]; and (v) retrovirus [Buchholz et al., Nat. Biotechnol. 
16(10):951-4 (1998)]. Yeast display systems, for example, employ C-terminal fusions to the Aga2p 
mating adhesion receptor of Saccharomyces cerevisiae [Boder and Wittrup, Nat. Biotechnol. 15(6):553-7 
(1997)]. Display of proteins using any of the above listed systems or mammalian transmembrane 
proteins (some of which are described herein) is generally achieved by inserting the nucleic acid encoding 
30 the DP-protein (or any other protein of interest) in frame with an amino-terminal secretion signal and a C- 
terminal transmembrane anchoring domain (as further described below). 

Similarly, myristylation sequences can serve as membrane anchoring sequences. It is known that the 
myristylation of c-src recruits it to the plasma membrane. This is a simple and effective method of 
membrane localization, given that the first 14 amino acids of the protein are solely responsible for this 
35 function: MGSSKSKPKDPSQR (see Cross et al., Mol. Cell. Biol. 4(9):1834-1842 (1984); Spencer et al., 
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Science 262:1019-1024 ,1993). bo,h o. which are hereby incorporated by ^ nmy Thls 
atready been shown to be effective in fhe Ibcaiizefion of reporter genes ana can be used to anchor the 
zeta cha,n of the TCR. This mo« is placed amino-termina, to m vanabie region in order ,o iocaiiae me 
us,on protein to the plasma membtane. Other mediations such as pafrnitoyiation can be used ,o anchor 
5 fusion proteins in me piasma membrane: for exampie, palmltoylation sequences from the G protein- 
coupled receptor kinase GRK6 sequence [LLQRLFSRQDCCGNCSDSEEELPTRL, with the boid 
cysteines being palmitolyated: Stoffe, e. a,., J. Biol. Chen, 269:27791-4 ,1994,,; from rhodopsin 
[KQFRNCMLTSLCCGKNPLGD; Barnstable and Morabito. J. Moi. Neurosci. 5( 3>:207-9 ,1994), and the 

,0 a" M Tc TT!" ILNPPDESGPGCMSCKCVLS ^ - Nature 302:33 ,1983,: Cadwal'lader e, 
10 a!., Mol. Cell. Biol. 14(7):4722-30 (1994)]. 

in a preferred embodiment, the ,us,on partner is a lysosoma, targeung sequence, including, for example a 
lysosomal degradation sequence sue* as Lamp-2 [KFERQ; Dice. Ann. N Y. Acad. Sci. 674 58-64 (1992)1 
or hrsosoma, membrane sequences from Lamp-1 t«LIP,AGFFALAGLVL,VUAyL lfiE1 <BSH^QI! 
Uthayakumar et at.. Cell. Mol. Biol. Res. 41:405-20 (1995)) or Lamp-2 

^ Z?sZT^ LAYF ' GUSm>mE ' 61 81 • ~ R es Comm. 

£ JZT " teh show ,ne '™ brane doma,ns in ™ and "» ~ 



Alternative,,, me fusion panner may be a mitochondria, localization sequence, inoiuding mitochondria, 
matnx sequences (e.g.. yeas, alooho, dehydrogenase ,11; MLRTSSLFTRRVQPSLFSRNILRLQST* Schatz 

n , w T« ,651 ' 6 <1987, ' : ml,0Ch<> " ara ' inne ' mambra " e Seq — «*«*■««- = oxidase 
subun,, IV: MLSLRQSIRFFKPATRTLCSSRVLL: Scha*. sopra >: mitochondria, ihtermembrane space 
sequences (yeast cytochrome d .; 

MFSMLSKRWAQRTLSKSFYSTATGAASKSGKLTQKLVTAGVAAAGITASTLLYADSLTAEAMTA Schatz 
supra) or mitochondrial outer membrane sequences (yeast 70 kD outer membrane protein 

25 MKSFITRNKTAILATVAATGTAIGAYYYYNQLQQQQQRGKK; Schatz. supra). 

The fus.cn partner may also be denved from endop<asm,c reticu.um sequences, includ.ng a sequence 
denved from calreticulin [KDEL; Pelham. Proc. R. Soc. Lond. B. Bio.. Sci., 250:1-10 (1992)] or from 
adenov,rus E3/19K prote.n [LYLSRRSFIDEKKMP; Jackson et al.. EMBO J. 9:3153-62 (1990)]. 

Furthermore, targeting sequences also include perox.some sequences [for example, the peroxisome 
30 matnx sequence from luciferase; SKL; Keller et a.., Proc. Natl. Acad. Sci. USA 84 3264-8 (1987)]- 

farnesy.at,on sequences [for example, P21 H-ras 1; LNPPDESGPGCMSCKCVLS, with the bold cysteine 
farnesylated: Capon, supra: Zhang et a,.. B.ochemistry. 35(25):8166-71 (1996)); geranylgeranylation 
sequences [for example, prote.n rab-5A: LTEPTQPTRNQCCSN. with the bold cysteines 
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geranylgeranylated. Farnsworth, Proc. Natl. Acad Sci. USA 91:1 1963-7 (1994)]; or destruction 
sequences [cyclin B1; RTALGDIGN; Klotzbucher et al.. EMBO J. 15(1 2). 3053-64 (1996)]. 

In a preferred embodiment, the targeting sequence is a secretory signal sequence capable of effecting 
the secretion of the DP-protein. There is a large number of known secretory signal sequences which, for 
example, when placed amino-terminal to the DP-protein region are cleaved from the respective fusion 
protein during the secretion process. 

Suitable secretory signal sequences, include those from IL-2 [MYRMQLLSCIALSLALVTNS; Villinger et 
al., J. Immunol. 155:3946-54 (1995)], growth hormone [MATGSRTSLLLAFGLLCLPWLQEGSAFPT; 
Roskam and Rougeon, Nucleic Acids Res. 7:305-20 (1979)]; preproinsulin 

10 [MALWMRLLPLLALLALWGPDPAAAFVN; Bell et al., Nature 284:26-32 (1980); and influenza HA protem 
[MKAKLLVLLYAFVAGDQ1; Sekiwawa and Lai, Proc. Natl. Acad. Sci. USA 80:3563-71 (1983)], with 
cleavage between the non-underlined-underlined junction. A particularly preferred secretory signal 
sequence is the secretory signal sequence from the secreted cytokine IL-4. which comprises the first 24 
amino acids of IL-4 as follows: MGLTSQLLPPLFFLLACAGNFVHG Other secretory signal peptides are 

15 discussed in von Heinje, supra. 

In a preferred embodiment, the fusion partner is a stability sequence which confers stability to DP or DP- 
protein or the nucleic acid encoding them. Thus, for example, proteins may be stabilized by the 
^corporation of glyc.nes after the initiation methionine (MG or MGG), for protection of the orotein to 
ubiquitination as per Varshavsky's N-End Rule [Bachma.r et al., Science, 234:179-86 (19&S); Gonda et 

20 al.. J. Biol. Chem. 264:16700-12 (1989); Varshafsky, Genes Cells, 2(1):13-28 (1997)], thus conferring 
long half-life in the cytoplasm. Similarly, one or two prolines at the C-terminus .mpart peptides that are 
largely resistant to carboxypeptidase action. The presence of two glycines prior to the prolines impart 
both flexibility and prevent structure initiating events in the di-proline to be propagated into the candidate 
peptide structure. Thus, preferred stability sequences are as follows: MG(X)„GGPP, MG(X) n GPP, 

25 MGG(X) n GGPP, and MGG(X) n GPP or wherein X is any amino acid and n is an integer of at least four. 

In a preferred embodiment, to increase the solubility of the DP-protein, lysines are added to the N- 
terminus, which may or may not comprise a glycine spacer. For example, the DP-protein K^-EFUVKS- 
protein-EFLIVKS can be made, which has different characteristics than the DP-protein without the KgG 4 
sequence added (see Examples). In this embodiment, the number of lysine residues and linker sequence 
30 can be determined experimentally to ensure the resulting DP-protein has the desired characteristics. 

in a preferred embodiment, combinations of fusion partners are used. Thus, for example, any number of 
combinations of fusion partners, targeting sequences, rescue sequences, and stability sequences may be 
used, with or without linker sequences. As is more fully described below, using a base vector that 
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contains at least one cloning site for receiving random and/or biased libraries, one can cassette in nucleic 
acids encoding various fusion partners 5' and 3' of the nucleic acid encoding the DP-protein. 

In a preferred embodiment, the DPs, DP-proteins, DPs fused to a fusion partner or DP-proteins fused to a 
fusion partner of the invention can be further modified. 

A compound wherein at least one dimerization peptide (DP) is fused to a protein of interest (P) for 
example, yielding DP-P, P-DP, DP-P-DP or simi,ar compounds, as more fully described above wherein 
DP ,s the dimenzation peptide and P is a protein of interest, is collectively referred to as "DP-protein" 
Covalent modifications of DP and DP-proteins are included within the scope of this invention. 

One type of covalent modification includes reacting targeted amino acid residues with an organic 
derivatizing agent that is capable of reacting with selected side chains or the N- or C-terminal residues of 
DP or DP-protein. Derivation with bifunctional agents is useful, for instance, for crosslinking DP or 
DP-protein to a water-insoluble support matrix or surface for use in the method for purifying anti-DP or 
ant,-DP-protein antibodies or screening assays, as is more fully described below. Commonly used 
crosslinking agents include, e.g., 1.1-bis(diazoacetyl)-2-phenylethane, glutaraldehyde N-hydroxy- 
succmimide esters, for example, esters with 4-azidosalicy.ic acid, homobifunctional imidoesters includ.ng 
disuccmimidyl esters such as 3,3--dithiobis(succinimidylpropionate). bifunctional maleimides such as bis- 
N-mal e ,m,do-1,8-octane and agents such as methyl-3-[( P -azidophenyl)dithio]propioimidate. 

Other modifications include deamidation of glutam.nyl and asparaginyl residues to the corresponding 
glutamy. and aspartyl residues, respectively, hydroxylation of proline and lysine, phosphorylation of 
hydroxyl groups of seryl or threonyl residues, methylat.on of the ammo groups of lysine, arginine and 
histidine side chains [T. E. Creighton. in Prote.ns: Structure and Molecular Properties W H Freeman & 
Co.. San Francisco, pp. 79-86 (1983)], acetylate of the N-terminal amine, and amidation of any C- 
terminal carboxyl group. 

Another type of covalent modification of DP or DP-protein included within the scope of this invention 
comprises altering the native glycosylate pattern of the polypeptide. "Altering the native glycosylate 
pattern" ,s intended for purposes herein to mean deleting one or more carbohydrate moieties found in 
either DP or DP- P rote.n. and/or adding one or more glycosylate sites that are not present in either DP or 
DP-protein. 



30 



Addite of glycosylate sites to DP or DP-prote,n may be accomplished by altering the amino acid 
sequence thereof. The alteration may be made, for example, by the addition of, or substitution by one or 
more serine or threonine residues to the native sequence of DP or DP-prote,n (for O-linked giycosy.ation 
sites). The DP or DP-protein amino acid sequence may optionally be altered through changes at the 
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DNA level, particularly by mutating the DNA encoding DP or DP-protein at preselected bases such that 
codons are generated that will translate into the desired amino acids. Methods for introducing mutations 
into DNA by in vitro mutagenesis are well known to those in the art and can be found, for example, in 
Sambrook et al., Molecular Cloning: A Laboratory Manual (New York: Cold Spring Harbor Laboratory 
Press, 1989) and Ausubel et a!., Short Protocols in Molecular Biology (John Wiley & Sons, Inc., 1995). 

Another means of increasing the number of carbohydrate moieties on DP or DP-protein is by chemical or 
enzymatic coupling of glycosides to the polypeptide. Such methods are described in the art, for example, 
in WO 87/05330 published 11 September 1987, and in Aplin and Wriston, CRC Crit. Rev. Biochem., 
10(4):259-306(1981). 

Removal of carbohydrate moieties present on DP or DP-protein may be accomplished chemically or 
enzymatically or by mutational substitution of codons encoding amino acid residues that serve as targets 
for glycosyiation. Chemical deglycosylation techniques are known in the art and described, for instance, 
by Sojar and Bahl, Arch. Biochem. Biophys., 259:52-57 (1987) and by Edge et al., Anal. Biochem., 
1 18:131-137 (1981). Enzymatic cleavage of carbohydrate moieties on polypeptides can be achieved by 
the use of a variety of endo-and exo-glycosidases as described by Thotakura and Bahl, Meth. Enzymol., 
138:350-359 (1987). 

Another type of covalent modification comprises linking a DP or a DP-protein to one of a variety of 
nonproteinaceous polymers, e.g., polyethylene glycol, polypropylene glycol, or polyoxyalkylenes, in the 
manner set forth in U.S. Patent Nos. 4,640,835; 4,496,689; 4,301,144; 4,670.417; 4,791,192 or 
4,179,337. 

As will be appreciated by those in the art, the DPs, DP-proteins, and fusion proteins of the invention can 
be made in a variety of ways. - 

In a preferred embodiment the DPs, DP-proteins, and fusion proteins are made synthetically, as is well 
known in theart. 

In a preferred embodiment, the DPs, DP-proteins, and fusion proteins are encoded by nucleic acids., as is 
well known in the art. 

In a preferred embodiment, the DP-proteins. including candidate DP-proteins, are translation products of 
nucleic acids. The candidate DP-protein comprises a randomized test protein. That is, every candidate 
DP-protein has a randomized portion, as defined above, that is the basis of the screening methods 
outlined below. In addition, to the randomized portion, the candidate DP-protein may also include a fusion 
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partner. In this embodiment, the nucleic acids are introduced into cells, and the cells express the nucleic 
acids to generate DP-proteins (or candidate DP-proteins). 

As outlined above, the DP-proteins are encoded by nucleic acids. A "nucleic acid", or "o.igonucleotide" or 
a grammatical equivalent thereof herein means at least two nucleotide residues covalently linked 
5 together. A nucleic acid of the present invention will genera.ly contain phosphodiester bonds 
Modifications of the ribose-phosphate backbone may be done to faci.itate the addition of additional 
mo.et.es such as labels, or to increase the stability and half-life of such molecules in physiological 
env.ronments. The nucleic acids may be single stranded or double stranded, or contain portions of both 
double stranded or single stranded sequence. The nucleic acid may be RNA, composing RNA mRNA 
10 and defined or random ribo-o.igonucleotides The nucleic acid may be DNA, comprising genomic DNA 
cDNA and defined or random deoxyribo-oligonuc.eotides. The nucleic acid may also be a hybrid where 
the nuc.e.c acid contains any combination of deoxyribo- and ribonucleotides, and any combination of 
nucleotide bases. 

The nudeic acids encode the DP-proteins and the fusion partners, if present. In addition, the nucleic 
15 ac.ds will also generally contain extra sequences to effect translation or transcription, as necessary 

Usuany. the nucleic acid encoding the DP proteins is incorporated into a suitable vector such as p.asmid 
vectors or retroviral vectors. In a preferred embodiment, when p,asm,d vectors are used to express the 
DP- P rote,ns, the nucleic acid is generally DNA. In another preferred embodiment, when retroviral vectors 
are used to express the DP-proteins, the nucleic acid is generally RNA. 

20 in a preferred embodiment, vectors are used to express candidate DP-proteins. By "vector" here.n is 
meant a repl.con wh.ch comprises nucleic acid and can be used for the transformation of host cells The 
vectors may be either self-replicating extrachromosoma. vectors, referred to as "plasmids" or "p.asmid 
vectors", or vectors which integrate into a host genome. A preferred embodiment utilizes retroviral 
vectors, as is more fully described below. 

25 For non-retrovira. embodiments, suitable vectors are derived from any number of known vectors 
including, but not limited to, P cDNA3.1 (Invitrogen), pSI (Promega Corporation), and pB. (Clontech 
Laboratories. Inc.). Basically, any mammalian expression vectors with strong promoters such as CMV 
can be used to construct vectors expressing DP-proteins. 

Generally, these expression vectors include transcriptional and transanal regulatory nucleic acid 
30 operably linked to nucleic acids which are to be expressed. "Operab.y linked" in this context means that 
the transcriptional and translational regulatory nucleic acid is positioned relative to a coding sequence 
(e.g. encoding DP-prote.n) in such a manner that transcription is initiated and translation of the prote.n is 
assured. Genera.ly, this will mean that the promoter and transcriptional initiation or start sequences are 
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positioned 5' to the coding region. The transcriptional and translational regulatory nucleic acid will 
generally be appropriate to the host cell used, as will be appreciated by those in the art. Numerous types 
of appropriate expression vectors, and suitable regulatory sequences, are known in the art for a variety of 
host cells. 

5 In general, the transcriptional and translational regulatory sequences may include, but are not limited to, 
promoter sequences (including CAAT box and TATA box), ribosomal binding sites (including internal 
ribosome entry sites (IRES)), transcriptional start and stop sequences (including mRNA polyadenylation 
sequence 5'-AATAAA-3'), RNA splicing sequences, translational start and stop sequences (including 5' 
and 3' untranslated regions, initiator codon (ATG), Kozak consensus sequence (5'-A/GNNATGG-3") and 

10 nonsense codons (UAA, UAG, UGA), either constitutive or inducible enhancer, activator or repressor 
sequences (located either upstream, downstream or overlapping relative to promoter and being either 
cell-line dependent, tissue-specific or temporally dependent), and protein targeting signals (including 
signals for endoplasmatic reticulum retention and extracellular secretion, signals for localization to plasma 
membranes, peroxisomes, nucleus, mitochondria, lysosomes, golgi complex and focal adhesions). 

15 In a preferred embodiment, the regulatory sequences include a promoter and transcriptional start and 
stop sequences. Promoter sequences include constitutive and inducible promoter sequences [for 
example, see Walther and Stein, J. Mol. Med. 74(7):379-92 (1996)]. In a preferred embodiment, the 
promoters are constitutive and drive the expression of e.g., the DP-protein encoding nucleic acid at a 
high level. The promoters may be either naturally occurring promoters, hybrid or synthetic promoters. 

20 Hybrid promoters, which combine elements of more than one promoter, are also known in the art, and are 
useful in the present invention. 

Particularly preferred promoters for expression in mammalian cells are CMV promoters. Preferred 
retroviral promoters are discussed below.' - 

In a preferred embodiment, the promoter is associated with at least one copy of a nucle.c acid encoding 
25 the DP-protein. Individual components encoding parts of the fusion protein, such as the dimerization 

protein, the protein of interest and one or more fusion partners can be inserted in a parental vector which 
comprises at least on suitable cloning site, preferable 3' to the promoter sequence. In a preferred 
embodiment, the fusion protein encoding nucleic acid is composed of individual components to generate 
a fusion protein such as DP-L-protein-L-DP or N-DP-L-protein-L-DP, wherein 'N' is a nuclear localization 
30 signal, DP' is a dimerization peptide, ' L' is a linker sequence and 'protein' is a protein of interest. As 
discussed in detail above, many possible combinations of nucleic acid components encoding individual 
components of the fusion protein to be constructed. 
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Generation of such vectors is performed using methods known to those in the art which are for example 
described in Sambrook et al.. Molecular Cloning: A Laboratory Manual (New York: Cold Spring Harbor ' 
Laboratory Press, 1989) and Ausubel et al., Short Protocols in Molecular Biology (John Wiley & Sons 
Inc., 1995). Pre-configured vectors are suitable to be included in kits The end user of such vectors will 
5 have to insert the nucleic acid encoding a protein of interest or a library of proteins of interest into 
convenient cloning sites. 

in another preferred embodiment, a rescue sequence is used to isolate the nucleic acid encoding the DP- 
protein. In this embodiment the rescue sequence may be a unique oligonucleotide sequence which 
serves as a probe target site to allow quick and easy isolation of the nucleic acid construct, via PCR, 
10 hybridization, or related techniques. 

In addition, the vector may comprise additional elements such as a origin of replication, selection genes 
etc., as is more fully described in Kriegler. in Gene Transfer and Expression: A Laboratory Manual 
Freeman and Company, New York. (1990) and Murray, Methods in Molecular Biology, Vol 7: Gene 
Transfer and Expression Protocols, Humana Press (1991). 

1 5 The nucleic acid encoding the protein of interest may be obtained from genomic DNA, cDNA. from 
defined oligonucleotides or from random nucleotides 

Usually the DP- P rote,ns and DP-fusion proteins will be encoded by nucleic acids and are generated after 
transcription thereof and translation of the corresponding mRNA. In one preferred embodiment 
concatemers of a nucleic acid encoding, for example, a DP fusion-peptide such as illustrated above 

20 (DP hyd -L P - P rotein 1 -L P -DP hyd -L G -DP Lys -L P -protein 2 -L P -DP Glu ) can be inserted into suitable cloning vectors 
(as detailed below) resulting in the generation of concatemerized DP-fusion proteins such as (DP nya - Lp - 
protein 1 -L P -DP hyd -L G -DP Lys -L P -protein 2 -L P -DP Glu ) n , wherein n is an integer of at least 2 As will be^ 
obv,ous to those ,n the art a plurality of DP fusion protein encoding nucleic acids other than those 
illustrated herein, including bivalent and monovalent derivatives thereof, can be combined in suitable 

25 vectors and the corresponding DP-proteins can be made. 

in one embodiment, retroviral vectors are used to express the candidate DP-proteins and the nucleic acid 
encoding the candidate DP-protein is generally RNA. 

A particularly well suited retroviral transfection system is described in Mann et al. Cell 33153-159 (1983) 
Pear et a.., Proc. Natl. Acad. Sc. USA 90(18):8392-6 (1993): Kitamura et al., Proc. Natl. Acad Sci USA ' 
92:9146-9150 (1995); Kinse.la et a.., Hum. Gene Ther. 7:1405-141 3 (1996), Hofmann et al Proc Nat. 
Acad. Sci. USA 93:5185-5190 (1996); Choate et al., Hum. Gene Ther. 7:2247-53 (1996) and WO 
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94/19478 and PCT/US97/01019; and references cited therein, all of which are expressly incorporated by 
reference. 

Any number of suitable retroviral vectors may be used. Preferred retroviral expression vectors include 
vectors based on the murine stem cell virus [MSCV; see Hawley et al., Gene Ther. 1:136-8 (1994)] and a 

5 modified MFG virus [Riviere et al., Proc. Natl. Acad. Sci. USA 92:6733-7 (1995)], and pBABE (see PCT 
US97/01019, incorporated by reference). Other suitable retroviral expression vectors are derived from 
Moloney murine leukemia virus and include vectors such as pLNCX, pLXSN, pLAPSN; a self-inactivating 
expression vector, such as pSIR; a bicistronic expression vector, such as pLXIN; inducible expression 
vectors, such as pRevTet-On, pRevTet-Off (Clontech Laboratories; see also Coffin and Varmus, in 

10 Retroviruses (Cold Spring Harbor Laboratory Press, New York, 1996)]. 

As described above for other vectors, retroviral vectors may include inducible and constitutive promoters. 
Constitutive promoters are preferred and include, but are not limited to, CMV, SV40, Sra, RSV, EF-1a, 
UbC and TK 

Generally, the retroviral expression vectors may include one or more selection genes (also referred to as 
1 5 selectable marker genes) under the control of internal ribosome entry sites (IRES), which allows for 
bicistronic operons and thus greatly facilitates the selection of cells express.ng fusion constructs at 
uniformly high levels; and promoters driving expression of a second gene, placed in sense or anti-sense 
relative to the 5' LTR. 

Selection genes allow the selection of transformed host cells containing the vector, and particularly in the 
20 case of mammalian cells, ensures the stability of the vector, since cells which do not contain the vector 
will generally die. Selection genes are well known in the art and will vary with the host cell used. By 
'selection gene" herein is meant any gene which encodes a gene product that either confers resistance to 
a selection agent or that encodes a marker allowing selecting the cell expressing this marker. Suitable 
selection agents include, but are not limited to, neomycin (or its analog G418), blasticidin S, histinidol D, 
25 bleomycin, puromycin, hygromycin B. and other drugs. Suitable marker genes, which can be inserted into 
a bicistronic transcriptional unit (see above) and subsequently allow the identification of host cells 
expressing a gene of interest include, but are not limited to, self-fluorescent markers such as green 
fluorescent protein, enzymatic markers such as lacZ, and surface proteins such as CD8. etc 

As described for the other vectors, the retroviral vectors may comprise a variety of transcriptional and 
30 translational regulatory sequences and at least one cloning site for the subcloning of at least one 
recombinant DNA fragment. 
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The compositions of the invention are introduced into host cells to screen for bioactive agents capable of 
altering the phenotype of a cell which expresses a gene of interest or protein of interest. By "introduced 
into " or grammatical equivalents herein is meant that the nucleic acids enter the cells in a manner 
suitable for subsequent expression of the nucleic acid. The method of introduction is largely dictated by 
the targeted cell type, discussed below. Exemplary methods include CaPO, precipitation, liposome 
fusion, lipofectin®, electroporation, viral infection, etc. [see Kriegler, Gene Transfer and Expression: A 
Laboratory Manual (New York: Oxford University Press, 1991); Roth, Protein Expression in Animal Cells, 
Methods in Cell Biology Vol. 43 (San Diego: Academic Press, 1994); and Murray, Gene Transfer and 
Expression Protocols, Methods in Molecular Biology, Vol. 7 (Clifton: Humana Press, 1991)]. 

The compositions of the invention may stably integrate into the genome of the host cell (for example, 
when using retroviral particles), or may exist either transiently or stably in the cytoplasm (i.e. through the 
use of traditional plasmids, utilizing standard regulatory sequences, selection markers, etc.). As many 
pharmaceutical^ important screens require human or model mammalian cell targets, retroviral vectors 
capable of transfecting such targets are preferred. 

As will be appreciated by those in the art, the type of cells used in the present invention can vary widely. 
Basically, any cell may be used, with mammalian cells being preferred, with mouse, rat, primate and 
human cells being particularly preferred. As is more fully described below, a screen will be set up such 
that the cells exhibit a selectable phenotype in the presence of a candidate DP-protein. As is more fully 
described below, cell types implicated in a wide variety of disease conditions are particularly useful, so 
long as a suitable screen may be designed to allow the selection of cells that exhibit an altered phenotype 
as a consequence of the presence of a candidate DP-protein within the cell. 

Accordingly, suitable cell types include, but are not limited to, tumor cells of all types (particularly 
melanoma, myeloid leukemia, carcinomas of the lung, breast, ovaries, colon, kidney, prostate, pancreas 
and testes), cardiomyocytes. endothelial cells, epithelial cells, lymphocytes (T-cell and B cell) , mast cells, 
eosinophils, vascular intimal cells, hepatocytes, leukocytes including mononuclear leukocytes, stem cells 
such as haemopoetic, neural, skin, lung, kidney, liver and myocyte stem cells (for use in screening for 
differentiation and de-different.ation factors), osteoclasts, chondrocytes and other connective tissue cells, 
keratinocytes, melanocytes, liver cells, kidney cells, and adipocytes. Suitable cells also include known 
research cells, including, but not limited to, Jurkat T cells, NIH 3T3 cells. CHO, Cos. etc. See the ATCC 
cell line catalog, hereby expressly incorporated by reference. 

In one embodiment, the cells may be genetically engineered, that is, contain exogenous nucleic acid (for 
example, encoding a target molecule) in addition to the compositions of the invention. 
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Once made, the compositions of the invention find use in a number of applications. The present invention 
provides compositions which are useful to identify, both in vivo and in vitro proteins capable of interacting 
with, binding to or modulating the activity of a second protein. 

In a preferred embodiment the present invention provides methods and compositions to create, effectively 
introduce into cells and screen compounds that affect a signaling pathway. Little or no knowledge of the 
pathway is required, other than a presumed signaling event and an observable physiologic change in the 
target ceil. The disclosed methods comprise an in vivo stratagem for accessing intracellular signaling 
mechanisms. The invention also provides for the isolation of the constituents of the pathway, the tools to 
characterize the pathway, and lead compounds for pharmaceutical development. 

The present invention provides methods for the screening of compounds, referred to herein as DP- 
proteins, which are capable of altering the phenotype of cells comprising them. By "candidate DP-protein" 
herein is meant a DP-protein for which a function, an intrinsic property, or an interaction with a second 
protein is sought. While the "DP" component of candidate DP-proteins is generally not changed within a 
molecular library, the "protein" component of candidate DP-proteins is variable. 

In one embodiment, a plurality of candidate DP-proteins is provided in form of a molecular library. The 
term "molecular library" herein is meant to include a plurality of different DP-proteins, a plurality of isolated 
different nucleic acids encoding a plurality of different DP-proteins, and a plurality of different nucleic acids 
which encode a plurality of different DP-proteins and which are comprised by vectors. The methods of 
the present invention provide for the rapid in vivo screening of molecular libraries comprising large 
numbers of candidate DP-proteins, wherein the 'protein' components of DP-proteins are encoded by a 
candidate nucleic acid, comprising either random oligonucleotides. cDNA fragments and genomic DNA. 
Thus, by delivering the random oligonucleotides, cDNA fragments and genomic DNA to cells, the cellular 
machinery generates the candidate DP-proteins. By screening the same cells, without the need to collect 
or synthesize in vitro the candidate DP-protein, highly efficient screening is accomplished. Thus, the 
present invention provides methods for screening a plurality of candidate DP-proteins, for effectors 
capable of altering the phenotype of a cell. 

Signaling pathways in cells often involve an effector stimulus (e.g.. chemokine, growth factor, hormone, 
etc.) that leads to a phenotypically describable change in cellular physiology Despite the key role 
intracellular signaling pathways play in disease pathogenesis, in most cases, little is known about a 
signaling pathway other than the initial stimulus and the ultimate cellular response. When peptides are 
intracellular^ expressed, they may modulate intracellular signaling pathways (Souroujon and Mochly- 
Rosen, Nat. Biotechnol. 16(10):919-24 (1998) and thus may participate in protein-protein interactions. 
Molecular libraries of chemical compounds or peptides were screened for effector molecules that 
modulate (e.g., up-regulate or down-regulate) signaling pathways. Thus constrained peptides contained 
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in minimized proteins may also be useful in the design of agents modulating intracellular protein-protein ' 
interactions [Cunningham and Wells, Curr. Opin. Struct. Biol. 7:457-462 (1997)]. which may offer a novel 
method of regulating intracellular signaling pathways. If the peptides are expressed in live mammalian 
cells, for example by using retroviral vectors, they may be screened for defined changes in cellular 
phenotype, and the resulting active peptides may provide a route for the affinity isolation of their binding 
targets. 



10 



Some form of conformational^ constrained peptides may be useful and even necessary in displaying 
peptides for intracellular combinatorial chemistry in live mammalian cells. Unlike peptides in phage 
display libraries, intracellular peptides may be subject to catabolism and thus preferably these peptides 
should be relatively inert to cellular proteases. Although intracellular peptide catabolism has not been well 
characterized, the ubiquitin-proteasome system is known to be involved in the degradation of proteins 
[Goldberg et al. Biol. Chem. 378:131-140 (1997); Hilt and Wolf, Trends Biochem. Sci. 21:96-102 (1996)] 
and can act as a carboxy-octapeptidase. Further proteolysis, perhaps involving aminopeptidases, can 
result in the degradation of peptides to amino acids [Lee and Goldberg, Trends Cell Biol. 8:397-403 
(1998)]. In antigen presenting cells, short linear peptides resulting from cytoplasmic proteolysis can be 
removed to the endoplasmic reticulum by the peptide transporters TAP1 and TAP2 [Belich and 
Trowsdale, Mol. Biol. Rep. 21:53-56 (1995)]. 

Developing a scaffold for the intracellular d>splay of expressed peptides which (i) is relatively inert to 
proteolysis resulting in enhanced intracellular stability and a higher steady state concentration of the 
expressed protein and (ii) which is also small enough to allow access to binding sites on prote.ns such as 
active site crevices may be very useful. The compact nature of this scaffold should decrease the 
flexibility of the expressed protein and decrease the conformational entropy, effectively increasing the 
concentration of individual conformers This and the increased stability to proteolysis should in turn make 
these scaffolds (e.g. *hen used as peptide libraries) more likely to contain active proteins, since the 
higher concentrations should allow saturation of weaker binding interactions. This benefits screening 
protocols to dete t bioactive peptides, by allowing phenotyp.c selection of lower affinity peptides, and thus 
allowing more bioactive peptides to be detected. Such features of enhanced proteolytic stability and 
diminished conformational entropy may also make the more compact structure more attractive as a 
potential therapeutic. Addition of specific short sequences to the N- and C-terminus of the peptide may 
be useful for enhancing the above propert.es. A loop structure [Leszczynsk, and Rose, Science 234:849- 
855 (1986)] may be of particular interest, since loops are globular and compact, are common on protein 
surfaces, and may be frequently involved in protein function and protein-protein interactions. 

In a preferred embodiment, the compositions of the invention are used to screen for candidate bioactive 
agents; that is the test protein within the DP-protein (see above) is a candidate bioactive agent. The 
candidate DP- P roteins. as part of a molecular library, are introduced into suitable host cells to screen for 
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DP-proteins, capable of altering the phenotype of the host cell, harboring or expressing such a candidate 
DP-protein. If necessary, the cells are treated to conditions suitable for the expression of genes encoding 
the candidate DP-proteins (for example, when inducible promoters are used), to produce the candidate 
expression products. 

In a preferred embodiment, a first plurality of cells is screened. That is, the cells into which a molecular 
library is introduced, which provides candidate DP-proteins, are screened for an altered phenotype. 
Thus, in this embodiment, the effect of the candidate DP-protein is seen in the same cells in which it is 
made; i.e. an autocrine effect. 

By a "plurality of cells' 1 herein is meant roughly from about 10 3 cells to 10 8 or 10 9 , with from 10 6 to 10 8 
being preferred. This plurality of cells comprises a cellular library, wherein generally each cell within this 
cellular library contains a member of the molecular library, i.e. a different candidate DP-protein or a 
different DP-protein encoding nucleic acid, although as will be appreciated by those in the art, some cells 
within the cellular library may not contain a member of the molecular library, and some may contain more 
than one. When methods other than retroviral infection are used to introduce the candidate DP-protein 
into a plurality of cells, the distribution of candidate nucleic acids within the individual cell members of the 
cellular library may vary widely, as it is generally difficult to control the number of nucleic acids which 
enter a cell during electroporation, etc. 

In a preferred embodiment, the molecular library is introduced into a first plurality of cells, and the effect of 
the expressed candidate DP-protein is screened in a second or third plurality of ceils, different from the 
first plurality of ceils, i.e. generally a different cell type. That is, the effect of the candidate DP-protein is 
due to an extracellular effect on a second cell; i.e. an endocrine or paracrine effect. This is done using 
standard techniques. The first plurality of cells may be grown in or on one media, and the media (referred 
to as "conditioned media") is allowed to touch a second plurality of cells, and the effect measured. 
Alternatively, there may be direct contact between the cells. Thus, "contacting" is functional contact, and 
includes both direct and indirect. In this embodiment, the first plurality of cells may or may not be 
screened. 

Thus, the methods of the present invention comprise introducing a molecular library of randomized 
candidate nucleic acids into a plurality of cells, generating a cellular library. Each of the nucleic acids 
comprises a different, generally randomized, nucleotide sequence, encoding a different DP-protein. The 
plurality of cells is then screened, as is more fully outlined below, for a cell exhibiting an altered 
phenotype. The altered phenotype is due to the presence of a DP-protein. 

By "altered phenotype' or 'changed physiology" or other grammatical equivalents herein is meant that the 
phenotype of the cell is altered in some way, preferably in some detectable and/or measurable way. As 
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will be appreciated in the art. a strength of the present invention is the wide variety of cell types and 
potential phenotypic changes which may be tested using the present methods. Accordingly any 
Phenotypic change which may be observed, detected, or measured may be the basis of the screening 
methods herein. Suitable phenotypic changes include, but are not limited to: gross physical changes 
5 such as changes in cell morphology, cell growth, cell viability, adhesion to substrates or other cells and 
cellular density; changes in the expression of one or more RNAs. mRNAs, proteins, lipids hormones 
cytokines, or other molecules; changes in the equilibrium state (i.e. half-life) of one or more RNAs 
mRNAs, proteins, lipids, hormones, cytokines, or other molecules; changes in the .oca.ization of one or 
more RNAs, mRNAs, proteins, lipids, hormones, cytokines, or other molecules; changes in the bioactivity 
10 or specific activity of one or more RNAs. mRNAs, proteins, lipids, hormones, cytokines, receptors or 

other molecules; changes in the secretion of ions, cytokines, hormones, growth factors, proteins or other 
molecules; alterations in cellular membrane potentials, polarization, integrity or transport changes in 
.nfectivity, susceptibility, latency, adhesion, and uptake of viruses and bacteria! pathogens; etc. 

By "capable of a.tering the phenotype" or grammatical equivalents, herein is meant that a candidate DP- 
1 5 prote.n can change the phenotype of the cell in some detectable and/or measurable way 

The altered phenotype may be detected in a wide variety of ways, as is described more fully below and in 
PCT/US97/01019, and will generally depend and correspond to the phenotype that is be,ng changed 
Generally, the changed phenotype is detected using, for example: microscopic analysis of cell 
morphology; standard cell viability assays, including both increased cel. death and increased cell viability 

20 for example, cells that are now resistant to ce„ death via virus, bacteria, or bacteria, or synthetic toxins ' 
standard labeling assays such as fluorometnc indicator assays for the presence or level of a particular' 
cell or molecule, including FACS or other dye staining techniques; biochemical detection of the 
express,on of target compounds after ki.ling the cells; monitoring changes in gene express.on within a 
target cel., etc. In some cases, as is more fully described herein, the altered phenotype is detected in the 

25 cell in which the molecular library comprising the randomized nucleic acid or randomized proteins was 
■ntroduced; in other embodiments, the altered phenotype is detected in a second cell which is responding 
to some molecular signal from the first cell. 
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In a preferred embodiment, upon its translocation into the nucleus, the DP-prote,n modu.ates gene 
expression causing an increase or a decrease of expression of a target gene. In one embodiment a 
transcriptional activation protein binds to the DP-prote,n and thus either may be inactivated or prevented 
from activating its target gene, .n this embodiment, the DP-protein comprises a protein which has an 
affinity to the target transcriptional activator, for example the HIV tat protein. In another embodiment DP- 
protem may lead to an increase expression of a target gene, by virtue of compr,s,ng a protem component 
wh,ch has an affinity to a transcriptional repressor. Upon binding of the transcriptional repressor to the 
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DP-protein, it either may be inactivated or prevented from binding to its target gene and thus leading to a 
higher expression of the gene of interest. 

In a preferred embodiment, once a cell with an altered phenotype is detected, the cell is isolated from the 
plurality of cells which do not have altered phenotypes. This may be done in any number of ways, as is 

5 known in the art, and will in some instances depend on the assay or screen. Suitable isolation techniques 
include, but are not limited to, FACS, lysis selection using complement, cell cloning, scanning by 
Fluorimager, expression of a "survival" protein, induced expression of a cell surface protein or other 
molecule that can be rendered fluorescent or taggable for physical isolation; expression of an enzyme 
that changes a non-fluorescent molecule to a fluorescent one; overgrowth against a background of no or 

1 0 slow growth; death of cells and isolation of DNA or other cell vitality indicator dyes, etc. 

In a preferred embodiment, the candidate nucleic acid encoding the candidate DP-protein and/or the 
candidate DP-protein is isolated from the cell with an altered phenotype. This may be done in a number 
of ways. In a preferred embodiment, primers complementary to DNA regions common to the vector, or to 
specific components of the molecular library such as a rescue sequence, defined above, are used to 

15 "rescue" the unique random nucleic acid encoding the candidate DP-protein. Alternatively, the candidate 
DP-protein is isolated using a rescue sequence which is operably linked to the candidate DP-protein (as 
described above). Thus, for example, rescue sequences comprising epitope tags or purification 
sequences may be used to pull out the bioactive agent, using immunoprecipitation or affinity columns. In 
some instances, as is outlined below, this may also pull out the primary target molecule, if there is a 

20 sufficiently strong binding interaction between the bioactive agent and the target molecule. Alternatively, 
the peptide may be detected using mass spectroscopy. 

Once rescued, the sequence of the candidate nucleic acid encoding the candidate DP protein and/or the 
sequence of the candidate DP-protein is determined. This information can then be used in a number of 
ways. 

25 Often, when genomic libraries or cDNA libraries or DNA fragments obtained thereof are employed in the 
screening method outlined herein (i.e., when they are used to encode candidate DP-proteins) the nucleic 
acid sequence encoding the test protein is not full-length, i.e., the nucleic acid sequence does not encode 
the complete test protein. By "full-length" cDNA, gene. mRNA, RNA or grammatical equivalents herein is 
meant any nucleic acid which encodes a complete protein as it is encoded by its corresponding cellular 

30 genetic locus. In addition to the complete protein encoding sequence, a full-length cDNA, gene, mRNA or 
RNA may optionally contain 5' and 3' untranslated nucleic acid sequences. The complete protein may 
include amino acids incorporated by translation of the corresponding mRNA. that may subsequently be 
eliminated from the native protein, e.g. secretory signal peptide sequences or sequences involved in 
protein splicing and protein processing By "full-length protein" or grammatical equivalents herein is 
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meant a protein encoded by a full-length cDNA, gene, RNA or mRNA. As appreciated by those in the art, 
full-length proteins may include posttranslationally modifications, including, but not limited to, signal 
peptide cleavage, protein splicing, protein precursor processing, glycosylation, and the like. Accordingly, 
a "partial cDNA", "partial gene", "partial mRNA", "partial RNA" or a "partial protein" or grammatical 
5 equivalents are meant to indicate a cDNA, gene, mRNA. RNA or a protein which represents a fragment of 
a full-length cDNA. gene, mRNA, RNA or a protein. Accordingly, in a preferred embodiment, the 
determined nucle.c acid sequence information of the rescued partial protein will be used to isolate the full- 
length coding sequence of the DP-protein. The isolation and characterization of a full-length coding 
sequence using a partial sequence information is well known in the art. 

10 In a preferred embodiment, the nucleic acid encoding the candidate DP-protein. or a nucleic acid 
encoding a full-length version thereof or any fragment of the full-length version, or a derivative of the 
candidate DP-protein (see below), is reintroduced into the host cells, to verify the originally observed 
altered phenotype of the cell. These cells may be the same as in the original screening experiment or 
different. This may be done using retroviruses, or alternatively using fusions to the HIV-1 Tat protein and 

5 analogs and related proteins, which allows very high uptake into target cells. See for example, Fawell et 
al.. Proc. Natl.. Acad. Sci. USA 91:664-8 (1994); Frankel and Pabo. Cell 55:1189-93 (1988); Savion et al.. 
J. Biol. Chem. 256:1149-54 (1981); Derossi et al.. J. Biol. Chem. 269:10444-50 (1994); and Baldin etal., 
EMBO J. 9:151 1-7 (1990), all of which are incorporated by reference. 

In a preferred embodiment, a recombinant DP-prote,n is generated (as outlined further below) and used to 
confirm the alteration of the phenotype of a target cell. This is a preferred embodiment, when the 
alteration of a phenotype was observed in a second or third plurality of cells as described above. That is. 
the effect of the candidate DP-protein may be due to its secretion from a first cell, wherein it was 
generated, followed by its binding to a cellular receptor on the second cell (i.e., different cell) or 
internal.zat.on by a different means and subsequently exerting its effect in or on this second cell. In this 
embodiment, the recombinant DP-protein or a derivative thereof is provided to the second cell and an 
alteration of phenotype is monitored. 

In a preferred embodiment, the nucleic acids encoding the DP-protein or a derivative thereof (referred to 
herein also as protein of interest) are used to express the respective recombinant protein A variety of 
expression vectors, including viral and non-viral expression vectors can be made which are useful for 
recombinant protein expression in a variety of systems, including, but not limited to, yeast, bacteria, 
archaebacteria. fungi, insect cells and animal cells, including mammalian cells. 

The protein of interest may also be expressed as a fusion protein, including fusions to fusion partners, as 
outlined before, or fusions to other protein sequences. Recombinant proteins of interest are produced by 
culturing host cells into which nucleic acids encoding the protein of interest (generally as an expression 
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vector) is introduced, under the appropriate conditions that induce or cause expression of the 
recombinant protein. 



In a preferred embodiment, the recombinant protein is purified following expression. Numerous suitable 
methods for recombinant protein expression, including generation of expression vectors, generation of 

5 fusion proteins, introducing expression vectors into host cells, protein expression in host cells, and 
purification methods are known to those in the art and are described, for example, in the following 
textbooks: Ausubel et al., Short Protocols in Molecular Biology (John Wiley & Sons, Inc., 1995); O'Reilly 
et al., Baculovirus Expression Vectors: A Laboratory Manual (New York: Oxford University Press, 1994); 
Kriegler, Gene Transfer and Expression: A Laboratory Manual (New York: Oxford University Press, 

10 1991); and Deutscher, Guide to Protein Purification, Methods in Enzymology Vol. 182 (San Diego: 
Academic Press, Inc., 1990). 

In a preferred embodiment either the DP-protein or the nucleic acid encoding it is used to identify target 
molecules, i.e. the molecules with which the DP-protein interacts. As will be appreciated by those in the 
art, there may be primary target molecules, to which the DP-protein binds or acts upon directly, and there 
15 may be secondary target molecules, which are part of the signaling pathway affected by the DP-protein. 

In a preferred embodiment, the DP-protein is used to pull out target molecules. For example, as outlined 
herein, if the target molecules are proteins, the use of epitope tags or purification sequences operably 
linked to the DP-protein can allow the purification of primary target molecules via biochemical means [co- 
immunoprecipitation, affinity columns, etc., for example, see Deutscher, Guide to Protein Purification, 
20 Methods in Enzymology Vol. 182 (San Diego. Academic Press, Inc.. 1990); Harris and Angal, Protein 
Purification Methods: A Practical Approach (Oxford: IRL Press at Oxford University Press, 1994); Harris 
and Angal, Protein Purification Applications: A Practical Approach (Oxford: IRL Press at Oxford University 
Press, 1990)]. Alternatively, the recombinant DP-protein, when expressed in bacteria and purified, can be 
used as a probe against a cDNA expression library made from mRNA of the target cell type. Or, DP- 

25 proteins can be used as a "bait" protein (e.g., when a DP-protein of defined sequence is employed in a 
screening to identify unknown binding proteins) or as a "test" protein (e.g., when a known protein is 
employed as a bait and screened against a molecular library comprising candidate DP-proteins) in either 
yeast or mammalian two or three hybrid systems (e.g., see Fields and Song, Nature 340:245-6 (1989); 
Vasavada et at., Proc. Nati. Acad. Sci. USA 88:10686-90 (1991); Fearon et al., Proc. Natl. Acad. Sci. USA 

30 89:7958-62 (1992); Dang etai./Mol. Cell. Biol. 11:954-62 (1991); Chienetal., Proc. Natl. Acad. Sci. USA 
88:9578-82 (1991); Luo etaL Bio/Techniques 22(2):350-352 (1997) and U.S. Patent Nos. 5,283,173, 
5,667,973. 5,468,614, 5,525,490, and 5.637.463). Such interaction cloning approaches have been very 
useful to isolate DNA-binding proteins and other interacting protein components. The DP-protein(s) can 
be combined with other pharmacologic activators to study the epistatic relationships of signal transduction 

35 pathways in question. It is also possible to synthetically prepare labeled DP-protein or a derivative 
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thereof and use it to screen a cDNA library expressed in bacteriophage, bacteria or eukaryotic cells for " 
those cDNAs which bind the DP-protein or its derivative Furthermore, it is also possible to use cDNA 
cloning via retroviral libraries to "complement" the effect induced by the DP-protein. In such a strategy 
the DP-protein would be required to be stoichiometrica.ly titrating away some important factor for a 
5 specific signaling pathway. If this molecule or activity is replenished by over-expression of a cDNA from 
within a cDNA library, then one can Cone the target. Similarly. cDNAs cloned by any of the above yeast 
or bacteriophage systems can be reintroduced to mammalian cells in this manner to confirm that they act 
to complement function in the system the peptide acts upon. 

Once primary target molecules have been identified and validated, secondary target molecules may be 
10 identified in the same manner, using the primary target as the "bait". In this manner, signaling pathways 
may be elucidated. Similarly, bioactive agents specific for secondary target molecules may also be 
discovered, to allow a number of bioactive agents to act on a single pathway, for example for combination 
therapies. 
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in a preferred embodiment, a molecular library of recombinant DP-proteins is used in in vitro binding 
assays to identify member that are capable of binding to a selected target protein, e.g.. a receptor, a 
ligand, an enzyme, etc. 

Generally, in a preferred embodiment of the methods herein, a target protein (which can be a recombinant 
protein or a naturally occurring protein) is non-diffusably bound to an insoluble support having isolated 
sample receiving areas (e.g. a microtiter plate, an array, etc.). The insoluble supports may be made of 
any composition to which the target protein can be bound, is readily separated from soluble material and 
-s otherwise compatible with the overall method of screening. The surface of such supports may be solid 
or porous and of any convenient shape. Examp.es of suitable insoluble supports include microtiter plates 
arrays, membranes and beads. These are typically made of glass, plastic (e.g., polystyrene) 
polysaccharides, nylon or nitrocellulose, teflon™, etc. Microtiter plates and arrays are especially 
convenient because a large number of assays can be carried out simultaneously, using small amounts of 
reagents and samples. The particular manner of binding of the target protein is not crucial so long as it is 
compatible with the reagents and overall methods of the invention, maintains the characteristics of the 
target protein and is nondiffusable. The target protein may be either bound directly to the insoluble 
support (e.g. via cross-linking) or indirectly (e.g., via antibody, other protein or nucleic acid, etc ) 
Preferred methods of binding include the use of antibodies (which do not sterically block the protein- 
protein interaction surface for the test protein and preferably are directed against a tag polypeptide which 
may be incorporated into the recombinant bait protein), direct binding to "sticky" or ionic supports 
chemical crosslinking, etc. Fo.low.ng binding of the target protein, excess unbound material is removed 
by washing. The sample receiving areas may then be blocked through incubation with bovine serum 
albumin (BSA), casein or other innocuous protein. 
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A molecular library comprising a plurality of recombinant DP-proteins is added to the binding assay. The 
binding assay is performed at any temperature which facilitates optimal binding, typically between 4°C 
and 40°C. Incubation periods are selected for optimal binding, but are also optimized to facilitate high 
through-put screening. Typically between 0.1 and 1 hour is sufficient. Determination of the binding of 
5 DP-proteins to the target protein may be done using a wide variety of assays, including labeled in vitro 
protein-protein binding assays, electrophoretic mobility shift assays (EMSA), immunoassays for protein 
binding, functional assays (phosphorylation assays, etc.) and the like, (e.g., see Harlow and Lane, 
Antibodies: A Laboratory Manual (New York, Cold Spring Harbor Laboratory Press, 1988) and Ausubel et 
al.. Short Protocols in Molecular Biology (John Wiley & Sons, Inc., 1995). 

10 The screening methods of the present invention may be useful to screen a large number of cell types 
under a wide variety of conditions. Generally, the host cells are cells that are involved in disease states, 
and they are tested or screened under conditions that normally result in undesirable consequences on the 
ceils. When a suitable bioactive agent is found, the undesirable effect may be reduced or eliminated. 
Alternatively, normally desirable consequences may be reduced or eliminated, with an eye towards 

15 elucidating the cellular mechanisms associated with the disease state or signaling pathway. These 
screening methods are outlined in PCT/US97/01019, hereby incorporated by reference. 

The following examples serve to more fully describe the manner of using the above-described invention, 
as well as to set forth the best modes contemplated for carrying out various aspects of the invention. It is 
understood that these examples in no way serve to limit the true scope of this invention, but rather are 
20 presented for illustrative purposes. All references cited herein are incorporated by reference in their 
entirety. 

EXAMPLE 1 

Novel peptides which form observable dimers under harsh conditions. 

Upon infusion into the electrospray source of a Finnigan LCQ ion trap mass spectrometer of a 3 x 10 ^ M 
25 pH 6.4 solution of EFLIVKS-amide, this peptide appears to self-associate to form dimers (fig. 3A), 

detected at exactly two times the monomer molecular weight in the gas phase, after surviving an inlet 
capillary temperature of 210°C and harsh electrospray conditions, and thus would be expected to 
dimenze at significantly lower concentrations in aqueous solution. The peptide also forms dimers (also 
detected by mass spectrometry) when eluted off a C18 reversed phase column at pH -2.5 in ca. 25% 
30 acetonitrile (fig. 3B). Comparison of its dimerization in fig. 3A with that of the test peptide SKVILFE (which 
forms dimers in the range of 10' 13 M in aqueous solution (Bodenmuiler et al., supra), when both are 
continuously infused by an electrospray interface into an ion trap mass spectrometer, suggests that both 
peptides dimerize to a similar extent (within a factor of 10 or so). This suggests that EFLIVKS may 
dimerize in aqueous solution at very low concentrations. The dimerization of EFLIVKS cannot be 
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predicted from dimension of SKVILFE since reversed sequences are often used as .nactive controls for 
bioactive peptides. 

LC/MS examination of the crude synthesis products from an all-single coupled fmoc synthesis of 
EFUVKS-amide, for shorter sequences which can dimerize after electrospray ionization is shown in fig 4 
5 HPLC elufion was with a gradient of 99.9% water-0. 1 % TFA to 99. 9% acetonitrile-0 1 % TFA Dimers of 
the following truncated sequences were detected by mass spectrometry with the percent acetonitriie in 
parentheses, a.l at - P H 2.5: peak 1, L.VKS-amide (23.5%), monomer m/z = 705.3, dimer m/z = 1409 3 
peak 4: EFLIVKS-amide (25%). monomer m/z = 834.4, dimer m/z = 1667.3; and peak 5, EFLIVK-amide 
(32%), monomer m/z = 747.4, dimer m/z = 1493.1. These results suggest that the N-terminal EF and C- 
10 terminal S can be deleted without abolishing dimerization. 

Examination of a peptide designed to form a short beta sheet. VSIKFEL, shows that upon e.ution from a 
C18 reversed phase column with mass spectrometry detection, the dimeric form of the peptide (m/z = 
1667.5) .s detected in addition to the monomeric form (m/z = 834.5) after e.ectrospraying into the ion trap 
15 Th,s suggests that this peptide, which conta.ns alternating hydrophi.ic and hydrophobic residues and thus 
may form a beta sheet, can also form stable dimers. 
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EXAMPLE 2 

EFLIVKS can form compact proteolytic^ resistant structures when added to the N- and C-terminus of a 

test 18mer polypeptide. 

The pe P t,de EFLIVKS. when fused to both the N- and C-terminus of a test 18mer polypeptide can form a 
compact structure of this polypeptide (referred to herein also as peptide 1). The 18mer polypeptide 
sequence is VGTIVTMEYRIDRTRSFV, derived from the barley c2-chymotryps,n .nhibitor [Leatherbarrow 
and Salacinski, Biochem.stry 30:10717-21 (1991)]. The analog of this peptide containing an N-and C- 
termma. cysteine, in both cases substituted for valine, is thought to fold into a sim.lar compact structure to 
the loop present in bar.ey chymotrypsin inhibitor-2. Such a compact structure should be a poor substrate 
for proteases such as elastase, and in fact has been proposed as an inhibitor of elastase, chymotrypsin 
and two variants of subtilisin. Th.s disulfide-cyc.ized analog has been synthes.zed and tested by us and 
is ,n fact a poor protease substrate, but a substrate nonetheless, and not an .nhibitor. The linear pe^de 
CGTIVTMEYRIDRTRSFC ,s a good substrate for elastase, with ca. 15 peptides produced after a 3 -.our 
incubat.cn (fig. 5A), with the proteolysis being monitored by reversed phase hplc coupled to mass 
spectrometry detection and identification of the proteolytic fragments. The same peptide with a d,sulfide 
bond between the two cystines is also a substrate for elastase, but with fewer pept.de products and the 
major .nitial cleavage occurring after the tyros.ne (fig. 58). Fus.on of the dimerizer EFLIVKS onto both the 
N- and C-terminus (EFLIVKS-VGTIVTMEYRIDRTRSFV-EFLIVKS-amide) creates a more proteolytically 
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resistant construct (fig. 5C). with little proteolysis evident after almost 3 hours, and very minor amounts of 
a number of different cleavage products. 

EXAMPLE 3 

Examination of the low energy conformers of peptide 1 

5 To examine the structural nature of the compact construct of peptide 1 , low energy conformers were 

obtained by a high temperature molecular dynamics-simulated annealing protocol similar to that published 
by Nilges et al. Protein Engineering 2:27-38 (1 988), as implemented in Discover 95. Structures were 
saved every 2 psec (from different trajectories lasting from 400 psec to 1400 psec at 900 K), cooled to 
300 K over 5 ps, and minimized using a distance-dependent dielectric constant (varying linearly between 

10 e = 80 at 80 A to e = 1 at 1 A separation) with 200 steps of the steepest descent algorithm, then with as 
many steps as necessary using the conjugate gradient algorithm to give a maximum derivative of less 
than 0.001 kcal/A. The resulting low energy structures were collected and compared from trajectories 
starting from 

a) peptide 1, with the 18mer polypeptide started from its conformation present in barley chymotrypsin 
15 inhibitor 1, [McPhalen and James, Biochemistry 26:261-269 (1987)1, and subsequently minimized using 

Discover 2.9.5, attached to the two dimerizers minimized from an extended conformation: 

b) structures derived from a continuation of the trajectory in a) starting from the last structure, but with 
the trajectory modified by the use of a different dseed (different initial velocities); 

c) a continuation of the trajectory in b) with a third dseed; 

20 d) a trajectory starting as in a) except with the dimerizers forced into a starting beta sheet structure; 

e) a trajectory starting as in a) except with the dimerizers forced into a starting right handed alpha 
helical conformation 

f) a trajectory starting from a fully extended peptide 1 

All structures within 20 kcal/mole of the lowest energy structure from trajectories a-f were collected and 
25 compared. Figure 6 shows an overlay of the 45 lowest energy structures (only the peptide backbone is 
shown) from all of the trajectories, after a least-squares alignment of the peptide backbones. All structures 
when exam.ned individually appear compact. Examination of the backbone conformations suggests that 
the 1 8mer polypeptide folds onto the surface of the dimerizers in different ways. Space filling models 
suggest that the resulting low energy structures are well-packed. This suggests that for polypeptide 
30 lengths on the order of 18 residues, a library of these constructs may be a library of very small proteins or 
compact structures. The relatively small size of these mmi-proteins should allow facile nmr structure 
determination and thus the establishment of structure-activity relationships. These compact low energy 
conformers are also consistent with the observed inertness of this construct to elastase. 

EXAMPLE 4 
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Estimation of the affinity of the folding peptides attached to the N- and C-terminus of a polypeptide " 

necessary to help form a compact structure. 

Unlike the strict requirements for very high affinity for efficacy of peptides which bind to a second peptide 
sequence which is not covalently linked, the affinity requirements here for making a compact structure are 
less demanding. High affinity peptides may well work, but are not required. The tethering of a second 
copy of a homodimeric peptide at a fixed distance from the first dimerizer (separated by the polypeptide) 
will result in a very high local concentration of the second dimerizer. An estimate of this local 
concentration is derived for an 8mer polypeptide tethering the two together as follows: a linear 8mer is ca 
3 A /residue x 8 residues or 24 A long when fully extended. A rough estimate of the distance from a 
second copy of the dimerizer would thus be in the range of 20 A or less, since the peptide will not be fully 
extended ,n all (or even many) conformations. A solution with a second attached copy of a dimerizer 
every 20 A away will have an effective concentration of ca. 0.2 mM. Thus peptides which form homo- or 
heterod.mers at 1/100 of this concentration. or2 uM and below, will be 99% cyclized by such a dimerizer 
Thus any homo- or heterodimerizer with a binding constant (for itself or its dimeric partner) of 2 uM or 
1 5 below will be sufficient for the formation of 99%- cyclized peptides. 

Based on minimized structures of peptide 1, the second copy of the dimerizer may be significantly closer 
to the first copy than 20 A, depending on the folded state of the polypeptide inserted between the folding 
pept.des. If on average it is 10 A away, its local concentration will be roughly 1.6 mM, and 99% cyclized 
peptides will be attained from dimerizers with self-binding constants of 16 uM or less. 



10 



20 



25 



Example 5 
Synthesis of peptide constructs 



1- 



The following materials were obtained from the indicated sources: Protected N-°Fmoc amino acid 
derivatives were purchased from Advanced ChemTech (Louisville, KY) and all the peptide synthesis 
reagents such as diisopropylcarbodiimide (DIC), N-hydroxybenzotnazole (HOBt), 2-(1-H-Benzotriazole 
yl)1,1,3.3-tetramethyluroniumhexafluorophosphate (HBTU), trifluoroacetic acd (TFA) N N- 
diisopropylethylamine (DIPEA), piperidine. thioanisole. ethanedithiol arid anisole were obtained from 
Sigma (St. Louis, MO). Pre-loaded Fmoc-Xaa-Wang-resins and H-Pro-2-CI-Trt-resin (to synthesize C- 
term.nal Pro peptides) were purchased from Novabiochem (La Jolla. CA). Organic solvents such as 
d,methylformamide (DMF) and dichloromethane (DCM) were from Fisher Scientific (Santa Clara CA) and 
30 were of analytical grade. 



The dimerizer scaffold peptides were synthesized on an automated Symphony/ Multiplex multiple 
peptide synthesizer of Protein Technologies Inc., (Tucson, AZ) following classical Fmoc-chemistry The 
duration for coupling (1.5 h/coupling) and depletion (3 x 20 min) steps were slightly modified to the 
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existing default program to achieve desired peptide in good yields. The pulsing rate of nitrogen gas to stir 
the resin mixture was carefully manipulated to ensure complete mixing of resin beads with the added 
reagents. Standard Fmoc-compatible side-chain-protection groups such as tertiary-butyl (tBu) for Ser, 
Thr, Glu, Asp, Tyr; trityl (Trt) group for Gin. His, Asn; tertiary-butyloxycarbonyl (Boc) group for Lys, Trp 
5 were used for the respective amino acid derivatives. Similarly, 2,2,4.6,7-pentamethyldihydrobenzofuran - 
5-sulfonyl (Pbf) group served as side-chain protection for Arg [Fields and Fields, Tetrahedron Lett. 
34:6661 (1993)). The coupling reactions were carried out twice with a five fold excess of Fmoc-protected 
a-amino acids in a mixture (50%, v/v) of DMF and DCM using DIC/HOBt mediated coupling procedure 
[Fields and Noble, Int. J. Pept. Protein Res. 35:161-214 (1990); Hudson, J. Org. Chem. 53:617-624 
10 (1 988)]. In some cases (coupling of Arg, His and Lys), triple coupling of amino acids were to be carried 
out in order to ensure completion of the reaction and also HBTU/HOBt coupling method was adopted in 
difficult situations [Knorr et al.. Tetrahedron Lett. 30:1927-1930 (1989)]. Unreacted amino groups were 
capped with 50% acetic anhydride in DMF. After coupling five amino acids in an automated mode per 
sequence, synthesis was paused and completion of the coupling reaction was checked by Kaiser's test 
15 [Kaiser et al.. Anal. Biochem. 34:595-598 (1970)] and proceeded further. At the end of the synthesis, the 
Fmoc group on the N-terminus was deprotected in 30 mm with 25% piperidine in DMF and the resin was 
washed extensively with DCM followed by absolute ethanol. After an extensive wash with absolute 
ethanol, peptides were cleaved from the resin in manual mode by treatment with King's cleavage cocktail 
(Reagent K) composed of TFA (82.5%)/phenol (5%)thioanisole (5%)/ ethanedithiol (2.5%)/ water (5%) for 
20 3 h at ambient temperature, during which, all the side-chain protections were also removed 

simulta ously [King et al., Int. J. Pept. Protein Res. 36:255-266 (1990)]. After the cleavage, crude 
peptide .vere precipitated using cold diethylether after which the precipitate was solubilized in 
water/acetonitrile solution and lyophilized as described previously [Gururaja and Levine, Peptide Res. 
9.283-289 (1996)]. 

25 The lyophilized crude peptide extracts were purified to homogeneity by reversed-phase high performance 
liquid chromatography (RP-HPLC) (Hewlett-Packard model 1 100 series HPLC system having UV variable 
wavelength DAD detector; San Francisco, CA, USA) using a semi-preparative Rainin (Woburn, MA) 
Dynamax 60 A ; ^versed-phase cyano column (10 X 250 mm) coupled to a guard column (10 X 50 mm). 
Mobile buffer consists of A: 0.1% TFA in water and B: 0.1% TFA in acetonitrile. A linear gradient of 0-40% 

30 buffer B in 40 min was employed to elute the peptide at a flow rate of 2.0 ml min'' using dual wavelength 
detection mode at 230 and 280 nm as described previously (Gururaja and Levine, supra). Fractions 
containing pure peptide were pooled and lyophilized. The integrity and identity of all the purified synthetic 
peptides were confirmed by on-line electrospray ionization-mass spectroscopy (ESI-MS) technique 
wherein the HPLC column outlet was connected directly to a Finnigan LCQ mass spectrometer (San 

35 Jose, CA. USA) equipped with the standard ESI source. Mass spectrometric data were in good 

agreement with the expected values. The peptide EFLIVKS-STKSIPPQS-EFLIVKS, used for nmr studies, 
was over 99% pure as judged by LC/MS. 
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Example 6 

Peptide dimerization observed by mass spectrometry 



Peptides were dissolved from a lyophilized white powder into water at pH 5.0. and the pH of the most 
concentrated stock was checked. After initial observation of peptide dimers when purifying the crude 
peptide, the Finnigan (San Jose, CA) LCQ mass spectrometer was tuned to optimize the signal intensrty 
of the dimer at pH 5.0. The optimal parameters were: heated inlet capillary, 130-1 50 C. source voltage 
4.0 kV, capillary voltage 38 v, tube lens offset 24 v, sheath gas 40-80 l/min., and auxiliary gas 20 l/min. All 
binding measurements were made using a continuous infusion rate of 5 - 15 10 pl/min. The relative ion 
current of the dimeric peptides was calculated as S (intensity of all dimer ions) / (S (all dimer ions) + S (all 
monomer ions)]; sodium adduct ions were included when observed. 

To construct a scaffold with a self-associating peptide at each end, we exam.ned variants of part of the 
sequence of a proposed self-associating peptide hormone, the neuropeptide head activator (Bodenmuller 
et al.. supra). After some preliminary tests, an analog which contains a reversed sequence of part of the 
peptide was used for further studies. To establish the stoichiometry of binding of this analog. EFLIVKS, 
the self association was examined by electrospray mass spectrometry. For binding studies the mass 
spectrometer was tuned on the dimeric version of the peptide (see Figure 3 and data not shown). No 
evidence for a trimeric or tetrameric peptide association was found. When the relative ion current of the 
dimer was plotted against the concentration of peptide infused into the source, a curve which is well fit by 
a rectangular hyperbola was seen (data not shown). This saturable dimer formation yields an apparent 
binding constant for EFLIVKS of 7.8 M M This experiment was repeated with different analogs of this 
sequence (Table 1). Replac.ng the N-terminal glu with lys did not significantly change the binding, 
suggesting that these peptides do not dimerize simply by forming reciprocal lys-glu interactions. Addition 
of glu and lys to make EEFLIVKKS results in an apparent 4-fold increase in self-binding affinity Each 
residue was individually replaced with alanine, and dimerization monitored Replacement of the N- 
terminal glu and C-terminal ser had little effect on the apparent dimerization constant, while ala 
replacement of F2, L3, 14, V5 and K6 weakened binding 5.4-fold. 9.7-fold, 5.4-fold, 10-fold and over 6- 
fold, respectively. 
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Table 1. Self-dimenzation constants measured by mass spectrometry. 



peptide sequence 


K d (app)pM 


peptide sequence 


K d (app) M M 


EFLIVKS 


7.8 


AFLIVKS 


12 


KFLIVKS 


6.8 


EALIVKS 


42 


VSIKFEL 


13 


EFAIVKS 


76 


' SKVILFE 


12 


EFLAVKS 


42 


EEFLIVKKS-acid 


2.1 


EFLIAKS 


82 
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EFLIVAS 


>50 






EFLIVKA 


9.9 



Example 7 

Inhibition of elastase by peptide constructs 

The activity of 100 nM porcine pancreatic elastase (Sigma Chemical Co., St. Louis, MO) in 0.1 M Tris 
buffer, pH 7.88, at 25°C was followed by cleavage of 100 uM succinyl-ala-ala-ala-p-nitroanilide at 41 2 nm 
5 for 1 -2 min. The assay kinetics were linear over this time. Inhibition by different peptide constructs was 
followed by preincubation of 10 uM peptide with elastase for 1-1.5 min. followed by addition of the 
substrate. Percent inhibition was calculated as [1 - {assay slope (peptide + elastase) / assay slope 
(elastase alone)}] x 100%. These assay conditions are identical to those used by Leatherbarrow and 
Salacinski, supra. 

10 As a test sequence for insertion between the N- and C-terminal peptide sequence EFLIVKS used to 
constrain the test sequence, we chose a variant of the sequence of the 18mer Ci2b protease-inhibitory 
loop. This sequence has been reported (Leatherbarrow and Salacinski, supra) to be a very potent 
inhibitor of subtilisin, chymotrypsin and elastase. To test this, we assayed the inhibition of elastase under 
identical conditions to those reported with both the disulfide-cyclized form as well as the EFLIVKS- 

1 5 constrained analog. The results are shown in Table 2. At a concentration of 500 nM, the disulfide- 
cyclized peptide cyclic[CGTIVTMEYRIDRTRSFC] causes only a slight inhibition of porcine pancreatic 
elastase, 6.2% (n = 3). Based on its reported apparent inhibition constant of 390 pM (Leatherbarrow and 
Salacinski, supra), and the concentration and Km of the substrate used in the assay, an inhibition of 
99.9% would be estimated assuming this putative inhibitory peptide is competitive with substrate. The 

20 same 18mer sequence, with the N- and C-terminal val of the native sequence substituted for the 

cysteines, was also tested with different combinations of dimerizer peptides fused to its N- and C-termini 
None gave significant inhibition of elastase (Table 2). 



Table 2. Peptide Inhibition of porcine pancreatic elastase*. 



assay 


inhibition 


expected 


n 


50 nM enzyme alone 


0% 


0% 


5 


+ 500 nM cyclic [CGTIVTMEYRIDRTRSFC] 


6.2 ± 1.2 


99.9" 


3 


+ 10 pM 

EFLIVKSVGTIVTMEYRIDRTRSFVEFLIVKS 


-4.3 ±0.3 


? 


3 


+ 10 pM EFLIVKS-18mer insert-SKVILFE 


7.0± 1.0 


? 


3 


+ 10 pM SKVILFE-18mer insert-EFLIVKS 


2.4 ± 0.29 


7 


3 
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* 10 MM SKVILFE-18mer insert-SKVILFE 



+ 2 mM PMSF 



-1.4 + 0.14 



100 



100 



assay a, pH 7.88. 25 C. „s,n 9 100 M M succa. a.a.a-ala-p.n^nHide as a subsjte, observed a, 4lk L 
Each value is derived from 3-5 replicates 

5 ''estimated as % = 100[inhibitor]/[inhibitor] + KM + [substratel/K ^ ««,,min,-, ^ 

inhibitor with a K, of 390 pM Isuostratej/KJ assuming the peptide is a competitive 

Example 8 
Elastolysis of peptide constructs 

To examine the effects of elastase on different peptides, purified synthetic peptides (10 uM> were 
10 disso, V edinO,MTri S pH7.88,at250C. E.astase was added to 100 nM. At time 0, 15 mi,. 1 2 3 and 
24 hours, an aliquot of the reaction mixture was injected onto a 0.1 x 25 cm C18 reversed phase hplc 

column (Vydac .no, Hesperia CA). The reaction m.xture was eluted using a gradient of 1 00 o/ 0 A (99 9 o /o 
H2Q, 0.1o/ 0 v/v trifluoroacetjc acjd) for 1Q mjn fo||owed by g i%/m . n jncreas(ng ^^^^ ^ b g% 

acetonitrile, 0.1% trifluoroacetic acid) to 60% B, followed by a 5%, m ,n. gradient of B to 100% Peptides 
were examined by direct e.ution from the column into the source of a Finnigan LCQ ion trap mass 
spectrometer. Peptides were scanned from 300-2000 amu. and identified by searching their mass with 
that of different fragments of the fu.. length peptide, or comparing their mass with different masses of 
expected e.asto.ytic fragments in the case of the cyclic peptide, using MacBioSpec (obta.ned courtesy of 
PE-Sc.ex, Foster City, CA). Proteolysis of the reduced peptide CGTIVTMEYRIDRTRSFC was done in the 
20 presence of 2 mM dithiothreito, (Sigma). Cleavage products of the oxidized peptide were either direcfiy 
chromatographed without reduct.cn, or chromatographed after an aliquot was treated first with 1 mM 
PMSF for 1 hour and then with 30 mM DTT for 10 min. 
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To further exam.ne the reason for the lack of expected elastase-inhibitory acfivity of cyclic 
[CGT. VTMEYR.DRTRSFC], and to examine the e.astolytic stability of some of the peptide constructs in 
M* . 1 we incubated each peptide at a concentration of 10 uM with 100 nM elastase for 3 hours at pH 
7.8, 25 C. Each reaction mixture was then chromatographed over a microbore C18 reversed phase 
column, and the peptide fragments were .dentified using mass spectrometry. The cyclic peptide reaction 
was examined either with or without subsequent reducfion (data not shown). The linear pept.de 
CGTIVTMEYRIDRTRSFC was highly susceptible to e.astolys.s, giving ca. 11 different .deniable 
peptides (Tab,e 3). The main cleavage was after Y9 and additional cleavages were after 14 V5 T6 M7 
T14 and F17. Cyclic [CGTIVTMEYRIDRTRSFC) appeared to be cleaved more slowly than 'its linear' ' 
analog, and after 3 hours was cleaved at fewer sites, mainly after Y9, and also after M7, T14 AND F1 7. 

The Ci2b loop 18mer with EFLIVKS attached to each end was not attacked to a sign.ficant degree by 
elastase after 3 hours, with only cleavage after Y16 initially being observed. Most low level peaks 
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observed in the chromatogram are mainly synthetic impurities also present in the absence of added 
elastase After 24 hours, enough proteolysis occurred to assign additional elastolytic sites in this 
construct as after V5, M14, T21 , F27 and I29. Thus cleavage occurred at many of the same residues as 
in the linear and cyclic peptides, but at a much reduced rate. 

5 Table 3. Identification of elastolytic peptides of the Ci2b loop peptide and analogs. 



peptide substrate* 


peptide substrate 


peptide substrate 


linear CGTIVTMEYRIDRTRSFC 


cyclic 

[CGTIVTMEYRIDRTRSFC] 


EFLIVKS-Ci2b insert-EFLIVKS 


parent peptide 21 50.2/21 51 .0 


parent peptide 2 1 49. 8/2 1 50 5 


parent peptide 3775.2/3775.1 


CGTIVTMEY 1016.3/1016.44 


C[CGTIV I MtYKlUK i rvorv-'jr^v-/ 

2168.3/2168. 5-single cleavage in 
ring 


EFLIVKSVGTIVTMEY 

1828.6/1828.98 


RIDRTRSF 1050.6/1050.58 


CGTIVTMEY 1016.3/1016.44 


RIDRTRSFVtr-Llvrvb 

1965.1/1965.14 


RIDRTRSFC 1152.6/1153.54 


RIDRTRSFC 1153.3/1152.6 


EFLIV or VEFLI 

620.2/620.37 


CGTIVTMEYRIDRTRSF 

2048.2/2048.0 


RIDRTRSF 1 050.6/1 050.58 


VEFLI or EFLIV 

620.1/620.37 
(second peak; both may be 
present) 


CGTIVTM EYRIDRT 

1658.0/1658.0 


EYRIDRT 952.4/952.49 


RSFVEF 784.4/784.4 


CGTIVTM 724.2/724.34 




KSVGTI VTM 935.4/935 . 52 


EYRIDRTRSFC 1444.9/1445.7 






EYRIDRT 952.4/952.49 






VTMEYRIDRTRSF 

1673.8/1673.84 






MEYRIDRTRSFC 

1576.9/1576.74 






M EYRIDRT 1083.6/1083.53 






TIMEYRIDRT 1184.6/1184.57 
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25 -presented as peptide fragment/observed monoisotopic mass/expected monoisotopic mass 

Example 9 

Deuterium exchange experiments using constrained loop peptides 



Deuterium exchange experiments were carried out by dissolving the peptide of interest in water at pH 5, 
and diluting the peptide 10-fold into 0,0 at t - 0. For the initial constructs tested, the peptide 
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concentrations after dilution in D 2 0 were in the range of 10 uM. For other time points, an aliquot of the 
peptide solution was quenched by addition of a 2.5-fold volume excess of 1:1 H 2 0:MeCN with 1% formic 
acid at 0°C or 25°C and immediately infused into the mass spectrometer. This acidic pH jump slows the 
rate of am.de bond hydrogen exchange with solvent. For selected time points the mass derived from the 
first 2 min. of the infusion was compared to that of later 2 min. blocks to assess the significance of back- 
exchange, which was usually 1 proton or less. The total number of exchangeable protons was derived by 
1) initially dissolving the peptide in DMSO, and diluting it directly into D 2 0 before quenching and 
measurement of the new mass of the peptide several minutes later; 2) diluting a peptide dissolved in 5% 
DMSO 10-fold into D 2 0; or 3) heating the solution of peptide diluted into D 2 0 at 100°C for 15 min. DMSO 
was included since in preliminary experiments low levels added to aqueous peptides appeared to greatly 
accelerate proton exchange. When all three methods were used with EFLIVKS- 
VGTIVTMEYRIDRTRSFV-EFLIVKS, they gave the same results. Calculation of the total protons 
exchanged included correction for the 10% by volume of H 2 0 present after dilution in D 2 0. For peptides 
which were soluble in the 1 mM range, samples from the 10-fold 0 2 O solution at pH 5 were directly 
infused into the mass spectrometer without quenching. Rate constants and amplitudes for deuterium 
exchange were derived by fitting the time course of the gain in mass above the fully protonated form to a 
single exponential function. 



20 



25 



Relative to surface-exposed residues, the amide backbone protons of peptides and proteins will 
exchange more slowly with deuterated water when they are buried in the interior of a protein (and 
inaccessible to water) or are involved in stable hydrogen bonding [Englander et al., Protein Science 
6:1 101-1 109 (1997)]. Mass spectrometry has been used to examine the hydrogen exchange properties of 
a variety of different proteins (see Chung et al., Protein Science 6.1316-24 (1997) and Smith et al J 
Mass Spectrometry 32:! 35:146 (1997) for recent examples), and the existence of slowly exchanging 
protons has been used to infer the existence of tertiary structure [McKnight et al., J. Mol. Biol. 12:126-34 
(1996)]. Deuterium exchange studies here were done at pH 5 since below pH 4.5 the constrained loops 
do not appear to retain structure as measured by circular dichroism (vide infra). To examine the 
compactness of the peptide dimerizer-constrained Ci2b loop peptide, the rate and stoichiometry of 
deutenum incorporation upon dilution into 0 2 O was examined. The results for a variety of different 
constructs are summarized in Table 4. 



30 inserts 4 ° eUteriUm 6XChan9e rates and am P'^es for constrained Ci2b peptide insert and other peptide 



dimerizer 1 


insert 


total 

exchangea 
ble protons 


proton 

exchange 

amplitudes 3 


exchange rate 
constants 


EFLIVKS 


ISmer 2 


66.5 ± 1.4 


29.3 ± 1.5 fast 
16 intermediate 
21 slow 


k intermed = 0.054 hr - 1 
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EEFLIVKKS 


18mer 


70.3 


39.4 fast 

7.9 intermediate 

23 slow 


k intermed = 0.15 hr- 1 




MGEFLIVKS- 
insert- 

EFLIVKSGPP 


18mer 








5 


K 6 G 4 -EFLIVKS- 
insert-EFLIVKS 


18mer 


87.7 


82.8 fast 




10 


K 3 GSGS- 
EFLIVKS- 
insert-EFLIVKS- 
GSGSK 3 












EFLKVKS 


18mer 


71.6 


70.1 fast 






^KVILFF 
o r\ v 1 1_ i \ — 


18mer 


66 


31 fast 

17 intermediate 

i ft CIOVA/ 


k intermed = 0.041 hr - 1 




KFLIVKS 


18mer 










FFI IVKS 


-STKSIPPQS- 


36.1 ± 1.9 


32.6 ± 3.8 fast 




15 


MGEFLIVKS- 

insert-EFLIVKS 

GPP 


-G 4 DYKDDDDKG 4 - 


35.9 


34.8 




20 


MGEFLIVKS- 
insert- 

EFLIVKSGPP 


-G4YPYDVPOYASLG3- 


40.3 


40.3 





1 data is presented for peptides of the form dimerizer-insert-dimerizer; the dimerizer sequence is the same 
at the N- and the C-terminus except as noted 



2 the 18mer standard insert is the Ci2b sequence - VGTIVTMEYRIDRTRSFV- 

3 the fast phase amplitude is calculated for the fastest exchange data, lasting at most ca. 1 hour 

25 The kinetics of deuterium exchange for protons for the construct EFLIVKS-VGTIVTMEYRIDRTRSFV- 
EFLIVKS-amide were determined (data not shown). A total of 66.5 Da was added to the time zero mass 
of the peptide upon complete proton exchange. In the fast phase of proton exchange, 29 protons 
exchanged; roughly 33 side chain, N- and C-terminal protons would be expected to be rapidly 
exchangeable at this pH if exposed to water. A further 16 protons exchanged at an intermediate rate with 

30 a rate constant of 0.054 hr -1 . This left 21 protons which are presumed to exchange even more slowly 
than observable on this time scale. Both classes of protons exchange at a rate slower than measured for 
surface-exposed protons, taken from nearest neighbors identical to those found in EFLIVKS [Bai et al. t 
Proteins: Structure, Function and Genetics 67:75-86 (1993)]. These control protons exchanged with 
deuterium with rate constants in the range of 6 - 60 hr 1 at pH 5. Similar results were obtained with the 

35 reversed dimerizer sequence attached to the same end of the 18mer insert. With this 18mer insert, 
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similar results were also obtained with an apparently more potent dimerizer (table 3) attached to each end 
of the insert, EEFLIVKKS. In this case, 39 of the 70 exchangeable protons exchanged with deuterium 
within an hour, 8 protons exchanged with a rate constant of 0. 15 hr -1 , and the remaining 23 protons 
exchanged more slowly than this. The total of 31 slowly exchanging protons in this analog was somewhat 
less than the 37 protons in the parent peptide, suggesting some subtle changes in structure between the 
two constructs. For the peptide analog with lys 6 -gly 4 - fused to the N-terminus of the parent peptide, 
designed to enhance the solubility of the peptide, all but ca. 5 protons exchanged within an hour. This N- 
terminal fusion may thus destabilize the structure or at least makes it more mobile. 

The side chain of the isoleucine normally at the 4th position in this peptide appeared in the low energy 
conformers obtained from high temperature molecular dynamics trajectories (vide infra) to be buried in the 
folded peptide. Thus we created a single point mutation at the 4th position in each dimerizer (creating 
EFLKVKS), and examined the effect of this mutation on the 18mer insert structure by deuterium 
exchange. If this mutation disrupted the structure in a significant way, the number of slowly exchanging 
protons might be diminished. When the deuterium exchange kinetics were examined, all but one proton 
exchanged within an hour 

We also examined the effect of insert sequences different from Ci2b on the exchange kinetics of the 
overall peptide when EFLIVKS or its analogs were fused to both termini of these inserts. One insert, 
STKSIPPQS, represented an analog of the protease inhibitor cyclicfCTKSIPPQC] [Gariani and 
Leatherbarrow, J. Peptide Res. 49:467-75 (1997)]. This short construct had 36 exchangeable protons 
when heated, 33 of which exchanged in an hour. Thus if this peptide has a folded structure, it contains 
only a few slowly exchanging protons. A second insert included the flag epitope tag, DYKDDDDK, 
flanked by four glycines on each end to increase its flexibility so as to allow binding of the epitope to anti- 
flag antibodies. This was fused to MGEFLIVKS- at the N-terminus and -EFLIVKSGPP at the C-terminus. 
This peptide had 36 exchangeable protons, 35 of which exchanged within an hour. A similar construct, 
also expected to be somewhat flexible due to the presence of 7 glycines, was synthesized with the 
influenza hemagglutinin epitope tag replacing the flag tag All protons were exchanged for deuterium 
within ca. 1 hour. Thus a shorter inserted sequence, or inserts with multiple glycines at each end to allow 
flexibility of the insert, did not have slowly exchanging protons. 

Example 10 
Models of peptide constructs 

None of the so far-examined peptide dimerizer constructs with slowly exchanging protons have been 
soluble in the millimolar range, thus structure determination by nmr is not readily accomplished To derive 
a working model of selected structures, which can be roughly compared to secondary structure content 
derived from circular dichroism. we used high temperature molecular dynamics to generate conformers 
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[Brooks, Chem. Scripta 29A: 165-1 69 (1989); Bruccoleri and Karplus, Biopolymers 29:1847-62 (1990); 
Auffinger and Wipff, J. Comput. Chem. 1 1:190 (1990)] and subsequent thorough minimization to energy- 
rank different conformers. The lowest 5-7 kcal/mole energy band width conformers are then compared. 
This method of "quenched molecular dynamics" has been applied to a tumor surface octapeptide and 
peptide fragments of different proteins (Brooks, supra), tuftsin and its cyclic analogs [O'Connor et al., J. 
Med. Chem. 35:2870-81 (1992)], and to linear and cyclic melanotropins [Al-Obeidi et al., J. Peptide Res. 
51:420-31 (1998)]. While this approach has also been applied to larger systems such as the 70 residues 
in hypervariable loops of antibodies [Bruccoleri and Karplus, Biopolymers 29:1847-62 (1990)] not enough 
conformers are generated for such large systems to provide complete conformational coverage [Dill, 
Biochemistry 24:1501-9 (1985)]. Thus this approach when applied to a 32mer will only allow examination 
of a few of the expected low energy conformers, giving only a rough idea of the overall fold. It may 
however allow a more significant coverage of conformation space for a cyclic 18mer or two linear 7mer 
peptides. 

We applied this methodology first to the disulfide bond-constrained cyclic 18mer peptide, 
cyclic[CGTIVTMEYRIDRTRSFC], which is thought to be a sub-nM inhibitor of eiastase and other 
proteases (Leatherbarrow and Salacinski, supra). For the cyclic peptide to be an inhibitor with similar 
potency to Ci2b, it presumably should have low energy conformers of similar structure and rigidity to the 
inhibitor loop of Ci2b. In its free form and in its complex with subtiiisin Novo [McPhalen and James, 
Biochemistry 26:261-9 (1987)], the overall backbone of the inhibitor loop is roughly planar, with the R65, 
R67 and F69 side chains filling the interior of the loop The edge which docks into the subtiiisin binding 
site is an irregular beta sheet, with the side chains of 156, T58, M59, and Y61 extending into solution or 
into subtiiisin in the bound complex, and the side chain of M59 noticeably bent from the solution structure 
when docked with subtiiisin. The structure of the native loop and the energy distribution of the minimized 
conformers (two trajectories. 5.4 ns, 2700 structures) of this cyclic peptide was determined (data not 
shown). It is roughly Gaussian, as shown previously for cyclic melanotropin analogs (Al-Obeidi et al, 
supra). The low energy conformers found for the cyclic peptide mimicking the loop appear to be 
significantly more compact and globular than the native inhibitor loop (data not shown) and they have 
backbone atom root mean square deviations from the 18mer inhibitor loop of 6.22, 6.12, 5.46, 6.31, 5.72, 
5.69 and 5.64 A respectively. Residues 3-10, which form much of the subtilisin-contact region and which 
surround the reactive site of the inhibitor loop (met 7-glu 8), have heavy atom root mean square 
deviations from the same residues in the native loop of 5.65, 6.40, 3.95 : 5.54, 5.20, 5.07 and 4.59 A, 
respectively. In addition, the side chains of R65, R67 and F69 are not buried inside the loop region in any 
of these low energy conformers. These results are consistent with significantly different structures for the 
low energy conformers when compared to Ci2b's inhibitory loop. These significant structural differences 
are consistent with the failure to observe inhibition of eiastase with this cyclic 18mer. 
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We next applied quenched molecular dynamics to look at low energy conformers of EFLIVKS when 
dimerized. These gas phase calculations may be particularly relevant to the low energy forms of peptide 
dimers observed by mass spectrometry. Peptide dimers were constructed by either tethering the two 
peptides together at approximately their centers of mass, or binding different parallel and antiparallel 
5 starting configurations (see the methods section) together in the gas phase before starting the 

conformation search. A total of 7 trajectories were run. covering 5250 minimized structures and 10.5 ns 
(data not shown) and a backbone overlay of the 14 lowest energy dimers, covering the lowest 7 kcal/mole 
energy bandwidth, from all of the trajectories was performed (data not shown). This was created by a 
least-squares superposition of all backbone atoms of one of the peptide dimerizers. Both peptides 
10 appear to adopt a turn conformation, but are not symmetric across the inter-dimer axis. 

In addition a cluster graph of the 14 lowest energy conformers, in which the backbone atoms of each 
conformer are compared to those of every other conformer, and the RMSD deviation (A) was created 
(data not shown). Two conformers are most similar if their RMSD difference is in the 0 - 1 A range. There 
appears to be one main family of low energy conformers, and several other unique conformations (data 
5 not shown). For all of the peptides, both the N- and C-terminus of the first peptide, which are charged in 
these simulations, appear to be close to the N- and C-terminus of the second peptide of the dimer. Since 
each peptide has two acidic and two basic groups, there are a number of different intra-dimer ion pairs 
which are possible. Examination of the distances for all possible inter-dimer ion pairs in all 14 low energy 
conformers suggests that the most stable ion pairs are a) peptide 1 : N terminus to peptide 2: glu 1 : side 
chain carboxylate; b) peptide 1: lys 6 e-amine to peptide 2: glu 1: side chain carboxylate; and c) peptide 
2: C-terminus to peptide 1; lys 6:e-amine. Both peptides form somewhat stable intramolecular ion pairs 
between their own N- and C-termini as well. 

Quenched molecular dynamics was also used to examine low energy structures of the Ci2b 18mer test 
insert fused to EFLIVKS at each end (data not shown). This peptide is relatively .nert to elastase, has 37 
slowly exchangmg protons, and shows no evidence for higher order aggregates (data not shown) when 
observe j by mass spectrometry. A total of 6900 different structures were collected from 12.3 ns of 
dynamics trajectories. These structures were distributed in a Gaussian distribution (data not shown). 
Two conformers were at least 7 kcal/mole lower in energy than all others (data not shown). Both 
conformers appear compact and globular, consistent with other experimental results above. As with the 
EFLIVKS dimer modeled above, each termimal EFLIVKS attached to the 18mer insert appears to form a 
turn, and their N- and C-termin. are within 3.8 - 4.3 A. However, unlike the structure of the EFLIVKS 
dimer, their second termini, which are now fused to the 18mer insert, do not loop back to the center of the 
molecule, but are instead 1 1.5 - 15 A apart in the two conformers. This distance is significantly greater 
than the comparable distance in the native Ci2b structure (4.1 A) and in the cyclic peptide low energy 
conformers (6.88 A on average) suggesting that the dimerizer peptides, at least with this insert, form a 
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"loop" with a fairly wide base. The 18mer insert also appears to contain a significant proportion of turn 
structure, consistent with circular dichroism measurements. 



Example 1 1 

Circular dichroism studies, NMR measurements and peptide conformation searches on peptide 

constructs 

Circular dichroism measurement 

CD spectra were recorded on an AVIV 62A DS CD spectropolarimeter (Lakewood, NJ, USA) equipped 
with a Peltier temperature control unit. The temperature of the instrument was maintained constantly 
below 20°C using Neslab CFT-33 refrigerated recirculator water bath. The device was periodically 
calibrated with the ammonium salt of (+)-10-camphorsulfonic acid according to manufacturer's 
recommendations. Spectra were recorded between 250 and 195 nm at 0.2 nm intervals with a time 
constant of 1 s at 25°C. Data were collected from five separate scans and averaged using an IBM PS/2 
computer . A cylindrical quartz cell of path length 0.1 cm was used for the spectral range with the sample 
concentration of 0.02 - 0.05 mM as determined by amino acid analysis. Peptide stock solutions (1 mM) 
were made in 10 mM KP0 4 buffer containing 100 mM KF at pH 7.5 except as noted. For pH titration 
experiments, pH of the buffer was carefully adjusted to desired value using either 0.1 M HCI or 0.1 M 
NaOH before adding the above peptide stock solution. Mean residual ellepticity (MRE) in deg.cnrr.dmo!- 1 
was obtained through the equation 

MRE(A) = 0(A)/ 10 Icn 

where 0(A) is the ellipticity in degrees at wavelength A. I is the path length in cm, c is the concentration in 
M. and n is the number of residues in peptide/protein [Schmidt, in Protein Structure: A Practical Approach, 
IRL Press, New York, pp251-285 (1989)). Raw data collected from individual experiments were 
converted to an ASCII format and the plots were created using Microsoft Excel software package as 
described previously [Gururaja and Levine, Peptide Res. 9:283-289 (1996)]. Thermal denaturation data 
were taken on samples containing 20 uU peptide in 10 mM KP0 4 buffer containing 100 mM KF at pH 7.5. 
The thermal denaturation was measured at 220 nm over a range of 4-98°C with a temperature step of 2°C 
and a 2 min equilibration time and a 60 s signal averaging time. Apparent T m was calculated as the 
maximum of the first derivative of the CD signal at 220 nm with respect to T V CD spectra were 
deconvoluted with the program ? CD spectra were deconvoluted using the program Dichroprot v. 2.4, 
which uses the variable selection method of Johnson. 

NMR measurements 
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All deuterated solvents such as D 2 0 (99.96% D) and DCI (99.5% D) for NMR experiments were 
purchased from Cambridge Isotope Laboratories (Andover, MA). Samples (~1 mM) were prepared by 
dissolving the synthetic peptide in 0.7 ml of H 2 0:D 2 0 90:10 (v/v) or 100% D 2 0. Sample in water was 
prepared by the dissolution of HPLC purified peptide, adjusting the pH to 4.0 with HCI or DCI. All pH 
5 values were measured at room temperature; the values reported herein are apparent pH values and we: 
not corrected for the deuterium isotope effect. TSP [3(trimethylsilyl)propionic-2,2,3,3-d 4 acid, sodium sail 
was used as an internal chemical shift standard. 

1 H NMR experiments were performed on a Varian Unity INOVA-500 spectrometer at 25°C 
equipped with a Sunsparcstation 5 as described previously [Naganagowda et al., J. Biomol. 
0 Struct. Dynam. 16:91-107 (1998)]. Two dimensional Double Quantum Filtered Correlated 
Spectroscopy (DQF-COSY) [Ranee et al., Biochem. Biophys. Res. Commun. 117:479-485 
(1983)], Total Correlation Spectroscopy (TOCSY) [Bax and Davis. J. Magn. Reson. 65:355-360 
(1985)], Rotating frame Overhauser enhancement spectroscopy (ROESY) [Bothner-By etal., J. 
Am. Chem. Soc. 106:81 1-813 (1984)] and Nuclear Overhauser Enhancement Spectroscopy ' 
5 (NOESY) [Macura and Ernst, Mol. Phys. 41:95-1 17 (1980)] experiments were acquired in pure 
phase absorption mode with quadrature detection in t, dimension using the hypercomplex 
method [States et al., J. Magn. Reson., 48:286-292 (1982)]. The carrier was placed on the water 
resonance to enable irradiation of the water during the relaxation delay (1.5 to 2.5 s) and during 
mixing time in NOESY experiments. For TOCSY spectra, MLEV-17 sequence was used with a 
0 spin lock time of 50 to 85 ms. For ROESY experiments spin lock times of 200 and 250 ms were 
used while for NOESY, mixing times of 200 and 300 ms were used. 'H NMR spectrum in H 2 0 
had a spectral width in both the dimensions of 5400 Hz. In D 2 0 solvent, after complete exchange 
of the amide protons, the spectrum was recorded by reducing the spectral width to 3000 Hz in 
both the dimensions. 256 or 512 t, increments were acquired with a size of 1024 or 2048 data 
points. Slowly exchanging amide protons were identified by dissolving the samples in D 2 0 and 
recording 1D and TOCSY spectra, .mmediately. For temperature coefficient measurements of 
the amide protons, 1D and TOCSY experiments were performed between 25 and 50°C in steps 
of 5°C. Typically 16 or 32 scans were collected for DQF-COSY and TOCSY spectra, and 64 
scans for ROESY and NOESY spectra. Prior to Fourier transformation, the free induction 
decays (FIDs) were zero filled once in both dimensions. For processing of DQF-COSY spectra, 
a squared sine-bell window function shifted by 90° was used in both the dimensions, whereas for 
the TOCSY, ROESY and NOESY spectra, the data were processed separately, using 90° and 
45° shifted squared sine-bell window functions. 

The 'H-'H distances for structure determination were deduced from NOE cross peak intensities 
m the 2D-NOESY spectrum obtained with 200 ms mixing time in water. Ranges of interproton 
distances were calculated by comparing the volume of the cross peaks and were categorized 
into three classifications. 1 .8-2.5 A (strong). 2.5-3.5 A (medium), and 3.5-5.0 A (weak), for the 
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distance geometry calculations. The vicinal coupling constants 3 J nh .coh ° f each residue were 
taken from the NMR studies and used to estimate possible torsional angles via the Karplus 
relationship [Karplus, J. Chem. Phys. 30:11-15 (1959)]: 3 J NH . C aH = A cos 2 6 - B cos6 + C where.6 
= I <t>_- 60° |. The A, B and C constants proposed by Pardi et al., J. Mol. Biol. 180:741-751 

5 (1 984) have values of 6.4, 1 .4 and 1 .9, respectively. The techniques used to obtain 

conformational data will only be briefly summarized as these have been discussed in great detail 
elsewhere. Delineation of conformation from NMR technique is purely based on the 
measurement of torsional angles along the polypeptide chain using two-dimensional NMR data 
acquired at high magnetic field strength. Specifically, to assign protons that are coupled through 

10 bond, TOCSY experiments are performed. Sequential assignments, for example Ha(i)-NH(i+1), 
are based on NOESY and ROESY experiment, in which a correlation is observed between 
protons in close spatial proximity which is then an indicator of conformations. 

Peptide conformation searches 

Low energy conformers of different peptide constructs were generated as follows. Explicit atom models 
15 of the peptide constructs were built using Insight II 95.0 (Molecular Simulations Inc., San Diego, CA) and 
the cff91 forcefield [Maple et al., J. Comp. Chem. 15:162-182 (1994)]. Peptides were modeled as zwitter 
ions, with lys, arg, asp and glu fully ionized in addition to the N-terminal amine and C-terminal 
carboxylate. The effects of an aqueous solvent environment and counterion screening were simulated 
by the use of a linear distance-dependent dielectric constant. The Veriet algorithm (Verlet, Phys. Rev. 
20 159:98-103 (1967)] with a time step of 1 fs was used to integrate the equations of motion; this was 

implemented as the default leapfrog algorithm of Discover 2.9.7. A 15 A cutoff was used for nonbonded 
interactions. Peptide bonds were restrained to the trans conformation at high temperatures using a 
torsional restraint of 5 kcal/mol/rad 2 . In the dynamics protocol, based on modifications of a program 
written by Mackay et al. [Mackey et al., in Prediction of Protein Structure and the Principles of Protein 
25 Conformation^ (Fasman, ed.; New York, Plenum Press) pp 317-358 (1989)], the starting peptide 

structures were first minimized using 300 steps of steepest descent and 1000 steps or as many steps of 
conjugate gradient minimization as necessary so that the maximum energy derivative was less than 0.1 
kca!/A, to remove high energy structures created during construction of the molecule. The peptide atoms 
were assigned random initial velocities using the dseed variable, and the peptide was heated to 900 K 
30 over 2 ps. Individual trajectories were continued for times varying from 400 ps to 3 ns with individual 

structures collected every 1-2 ps for subsequent minimization. Each saved structure was equilibrated at 
900 K for 50 fs, cooled to 300 K over 5 ps, and minimized with 300 steps of the steepest descents 
algorithm followed by Fletcher-Reeves conjugate gradient minimization using as many steps as 
necessary to give a maximum energy derivative of less than 0.001 kcal/mole/A. The minimized total 
35 energy vs. number of conformers in individual 5 kcal/mole windows was plotted for each peptide. The 
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conformers in the lowest 5 kcal/moie window above the minimum energy [O'Connor et al.. J. Med. Chem. 
35:2870-81 (1992)] were selected for further analysis 

Starting structures for the different peptides were obtained as follows. For dimerized EFLIVKS, extended 
structures were aligned in a parallel or anti-parallel fashion, with the Cy1 of ile 4 ca. 7 A apart, giving 4 
different starting structures Two extended structures (parallel and antiparallel) were tethered together 
with an energy penalty of 100 kcal/mole when the distance between the Cy1 of ile 4 of both peptides was 
outside of the range of 1.5-12 A. For the putative protease inhibitor cyclicfCGTIVTMEYRIDRTRSFC] 
the initial structure was a mixture of right handed alpha helix and beta sheet allowing formation of a 
disulfide bond between the two terminal cysteines. A second run started with a partially minimized 
version of the first structure 

For the peptide dimer-constrained construct EFLIVKS-VGTIVTMEYRIDRTRSFV-EFLIVKS, several 
different starting structures were used. One started from the Ci2b-based structure (PDB file 2CI2) of the 
18mer insert, which was derived by removing all residues from the crystal structure except for the 
inhibitor loop, and mutating individual residues to give the 18mer sequence reported in Leatherbarrow 
and Salacinski (supra). EFLIVKS in an extended conformation was fused to each end of the peptide and 
the resulting construct was minimized as above. A second structure started from EFLIVKS fused as a 
beta sheet to each end of the 18mer Ci2b insert. A third started from EFLIVKS fused as a right handed 
alpha helix to each end of the 18mer Ci2b insert. A fourth started from an extended conformation for the 
entire construct, and a fifth started from a different partially extended conformation. A sixth run started 
with the entire construct as a beta sheet. 

Since the peptides studied here are soluble at neutral or near-neutral pH at levels well below the 
millimolar range needed for an nmr structure determination, we examined their solution structure using 
circular dichroism (CD). Circular dichroism measurements are sensitive to the secondary structure of 
both peptides and proteins, and have been extensively used to examine the conformation of both 
[Bloemendal and Johnson, Pharm. Biotechnol. 7:65-100 (1995); Woody, Methods Enzymol. 246:34-71 
(1995); Greenfield, Anal. Biochem. 235:1-10 (1996)]. Here these measurements are used to examine 
the pH-dependence of secondary structure formation and stability, to compare the effects on insert 
structure of different dimenzers. to examine the effects of mutations in the dimerizers. and to look at the 
effects of different insert sequences on the overall structure of dimerizer-constrained loops. When these 
measurements are combined with measurements of proteolytic susceptibility, deuterium exchange, and 
the results of conformational searches, they give information on the overall structure and folding of the 
mini-loops examined here. 

EFLIVKS-dimerized 9mer insert 
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The first insert examined was EFLIVKS-STKSIPPQS-EFLIVKS The 9mer insert represents an analog of 
the protease inhibitor cyciic[CTKSIPPQC] (Gariani and Leatherbarrow, supra). The CD spectrum was 
recorded between pH 3.5 - 8.5 (data not shown). A pH-dependent transition in secondary structure was 
observed. At pH 3.5, a secondary structure with a strong minimum at 201 nm was seen. While this is 
near the expected minimum for a random coil [Greenfield and Fasman, Biochemistry 8:4108-4116 
(1969)] of 195-197 nm, the shape of the spectrum is also similar to that of a type 1 beta turn observed in 
a short peptide [Perczel et a!., Int. J. Peptide Protein Res. 41:223-236 (1993)]. 

1 H-NMR examination of low pH structure 

As this CD spectrum was seen with a number of other inserts under defined conditions, and 
since this peptide was quite soluble at low pH, we examined this structure using nmr. The 
resonance assignments of the 1 H-NMR spectrum of 9mer insert in water were made by standard 
sequential assignment procedures [Wuthrich, in NMR of Proteins and Nucleic Acids, New York, 
Wiley-lnterscience, pp 166ff (1986)]. The assignments of 1 H resonances were accomplished by 
the combined analyses of 2D-TOCSY and 2D-NOE spectra. The 2D-TOCSY spectrum was also 
recorded at various temperatures (25 to 50°C) to resolve overlapping connectivities for 
unambiguous assignments, and was also used to determine the temperature coefficients of the 
NH chemical shifts. The resonances buried under the water signal (in 90% H 2 0) were assigned 
by recording the spectra in 100% D 2 0. The chemical shifts of all the assigned protons are listed 
in Table 5. The temperature coefficients of NH chemical shifts, 'H/H exchange rate of amide 
groups, J nh . C oh values, and a set of characteristic strong, medium, and weak NOE connectivities 
have been used as criteria to examine whether the peptide has any preferred backbone 
conformation in aqueous solution. 

The temperature coefficients of all amide resonances are found to be = 0.004 ppm K" 1 (data not 
shown), suggesting that the backbone NH groups are exposed to the solvent and not involved in 
any intramolecular hydrogen bonding interactions. The fast 1 HfH exchange rate observed for all 
backbone amide resonances provides further evidence that the amide groups are not involved in 
any intramolecular hydrogen bonding. The prevalence of strong d^, , +?) and weak d mi () NOEs 
and a continuous stretch of weak and medium d mit i} and d aA , t NOEs in the absence of any 
observable d NN NOE interactions indicate that the backbone dihedral angles are predominantly in 
the unfolded extended region of <J>, q>_space [Ranee et al. ( Biochem. Biophys. Res. Commun. 
1 17:479-485 (1983); Pardi et al. t J. Mol. Biol. 180:741-751 (1984)]. The J NH . Ca H values provided 
in Table 5 are in the range of 6.5 to 8.4 Hz for ail residues except Ser-7. For a regular P-strand, 
the J W H-caH is expected to be - 9 Hz, while for cc-helix it is - 4.0 Hz [Ranee et al., Biochem. 
Biophys. Res. Commun. 117:479-485 (1983); Pardi etaL J. Mol. Biol. 180:741-751 (1984)]. The 
coupling constants of 6.5-8.4 Hz observed for this peptide suggest the existence of populations 
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of unfolded nonhydrogen bonded conformations of comparable energy with <t> values exceeding 
the regular helical region. Collectively, the NMR data provide evidence that EFLIVKS- 
STKSIPPQS-EFLIVKS is unstructured in aqueous solution. 



Table 5. Compilation of 'H Chemical Shift values for EFLIVKS-STKSIPPQS-EFLIVKS at pH 4.0. 


| Xaa (No.) 


NH C-H >J NH . CaH 


C P H 


C ¥ H 


C*H 


C C H 



BNSDOCID- rWO 
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E-1 


— 


3.870 


— 










F-2 


7.941 


3.954 


8.13 


3.005 


— 


— 




L-3 


8.060 


4.193 


7.80 


1.347 
1.347 


1.347 


0.722 
0.670 




I-4 


7.926 


3.978 


7.92 


1.678 


1.380 
1.025 


0.713 




V-5 


8.116 


3.946 


8.22 










K-6 


8.342 


4.190 


6.54 


1.680 
4 con 


1.270 

1 970 


1.525 


2.827 
2.827 


S-7 


8.237 


4.300 


6.17 


3.710 
3.710 








S-8 


8.338 


4.369 


6.66 


3.775 








— 


T-9 


8.080 


4.180 


7.23 


4.060 


1.048 





— 


K-10 


8 182 


4.208 


6.68 


1.690 
1 son 


1.280 
1 280 


1.525 
1.525 


2.839 
2.839 


S-11 


8090 


4.320 


7.58 










1-12 


8.114 


4.320 


8.02 


1.700 


1.344 
1.030 


0.771 


— 


P-13 


— 


4.196 


— 










P-14 


— 


4.266 


— 










Q-15 


8.365 


4.117 


6.96 


1.950 
1.860 


2.240 
2.240 


— 


— 


S-16 


8.146 


4.212 


6.73 










E-1 7 


8.110 


4.096 


6.93 


2.090 
1.876 


2.148 


— 


— 


F-18 


7.946 


4.438 


8.22 


3.005 
2.805 





— 


— 


L-19 


7.822 


4.181 


8.02 


1.487 
1.487 


1.487 


0.738 
0.694 


— 


I-20 


8.045 


3.948 


7.97 


1.642 


1.350 
1 020 


0.720 




V-21 


8.151 


3.936 


8.19 


1.860 


0.748 
0.748 






K-22 


8.295 


4.220 


6.73 


I . ( UU 
1.600 


1 9QO 

1.290 


1.530 


2.835 
2.835 


S-23 


8.173 


4.290 


7.79 











At pH 4.5, a different secondary structure was observed, which remained stable up to pH 8.5. The CD 
25 spectra at pH 4.5-7.5 had a much diminished band at 202 nm, indicating a loss of random coil. They also 
had a slight positive band at ca. 210-215 nm, and a negative band around 228-230 nm, indicating the 
presence of beta turns [Brahms and Brahms. J. Mol. Biol. 138:149-178 (1980)]. Since at pH 5.0 this 
construct has 3 or fewer slowly exchanging protons (table 4), the peptide may be unfolded at this pH (i.e. 
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it has no tertiary structure) but with a secondary structure containing some beta turn and significantly 
less random coil than at pH 3.5. Alternatively, if it is folding and has some tertiary structure, the 
backbone is mobile enough so that no amide protons are sequestered from solvent for a long period of 
time. When observing the CD spectrum at 225 nm. the structure present at P H 7.5 has a T m of 39.6 ± 
5 1°C (data not shown). 

EFLIVKS-VGTIVTMEYRIDRTRSFV-EFLIVKS 

A second construct examined by CD contained the Ci2b 18mer insert, EFLIVKS- 
VGTIVTMEYRIDRTRSFV-EFLIVKS. The pH-dependence of the CD spectrum of this peptide was 
determined (data not shown). Unlike the first peptide examined above, the CD spectrum is not as pH- 

10 dependent, and does not appear to have a major amount of random coil. The strong maximum around 
210 nm and strong minimum at 225-230 nm are consistent with a significant content of beta turn 
structure at all P H values examined turns [Brahms and Brahms, J. Mol. Biol. 138:149-178 (1980)] The 
smaller minimum seen at ca. 200 nm is consistent with a small percent of random coil, or the presence of 
a type I. beta turn [Perczef et al„ Int. J. Peptide Protein Res. 41 :223-236 (1993)]. Using the signal at 225 

1 5 nm, the peptide can be melted with temperature, with a T m of 39.85 + 1 .6°C. 

Constructs with an N-terminal MG- and C-terminal -GPP 

For peptide express.on in live cells, MG- was added to the N-terminus of a number of peptide 
constructs, and -GPP was added to the C-terminus to b.ock proteolysis by cellular carboxy peptidase 
[Vanhoof et al., FASEB J. 9:736-44 (1 995)]. The CD spectra of a variety of these peptides were then 
compared at pH 7.5 (data not shown) Examination of the pH-dependence of the CD spectrum of 
MGEFUVKS-Ci2b insert-EFL.VKSGPP was performed (data not shown)and suggests that the addit,ona, 
five res.dues cause significant changes in the CD spectrum compared to EFLIVKS-Ci2b insert- 
EFLIVKS. The positive band at 208 nm is no longer distinct, and the negative band at 200 nm has 
disappeared. The major minimum around 225 nm (characteristic of some beta turn structure; remains 
Thus the addition of these five residues appears to cause distinct conformational changes, but not 
unfolding of the structure 

Addition of other insert sequences also resulted in rather different CD spectra. An insert consisting of the 
flag epitope tag with glycine spacers, -G.DYKDDDDKG,-, designed to allow detection of expressed 
peptide in cells us.ng Western blots, resulted in a CD spectrum containing a minimum at ca 202 nm and 
a small minimum at ca 220 nm (data not shown). Based on the similarity of this spectrum to that of 
EFLIVKS-STKSIPPQS-EFLIVKS, this peptide appears to be mainly random coi. between pH 3 5 - 8 5 
This construct does not have slowly exchanging protons, consistent with its unfolded structure An insert 
consisting of the influenza hemagglutinin epitope tag with glycine spacers, -G 4 YPYDVPDYASLG 3 - g,ves 
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a CD spectrum with a minimum at 205-207 nm and a second smaller minimum at ca. 220 nm (data not 
shown). This may be due to a somewhat different composition of secondary structures, and could 
include some alpha helix (due to the minimum at 205-207 nm) as well as random coil or beta turn. Since 
this construct also did not have slowly exchanging protons (table 4), the CD spectrum may reflect the 
5 presence of only secondary structure. 

Other additions to the EFLIVKS sequence 

The effects of mutations in the EFLIVKS sequence on the CD spectrum of the Ci2b peptide insert were 
determined (data notshown). The peptide EEFLIVKKS-Ci2b insert-EEFLIVKKS is of particular interest, 
since it has 23 slow-exchanging protons and 8 intermediate-exchanging protons (table 4) and thus may 
1 0 have tertiary structure, and because this dimerizer may have a somewhat higher self-affinity than 
EFLIVKS It gives a CD spectrum which is similar to that of the control peptide, except that the 
minimum at 202 nm is missing, and the maximum at 210 nm (control peptide) is shifted closer to 207 nm. 
This peptide thus appears to have beta turn structure and less random coil than the control peptide. 

To increase the solubility of the structure, lysines were added to the N-terminus with a glycine spacer. 

1 5 For the construct KeG 4 -EFLIVKS-Ci2b insert-EFLIVKS. a very different CD spectrum was obtained than 
for the control peptide, with a broad minimum at ca. 220 nm (data not shown). This spectrum does not 
appear to be characteristic of any one dominant secondary structure, but can be deconvoluted to a 
mixture of beta sheet and beta turn (58%). alpha helix (14%) and the rest random coil. Since this 
structure has at most 5 slowly exchanging protons (table 4), the additional residues added to the N- 

20 terminus appear to have destabilized the tertiary structure of the control peptide, while creating a 
different secondary structure. 

Mutations in the EFLIVKS sequence 

Three charge modifications of the dimerizer sequence were tested at pH 7.0. In one peptide, a single lys 
and glu were switched between dimerizers. giving KFLIVKS-Ci2b insert-EFLIVES. In a second peptide, 

25 the glutamate of each dimerizer was mutated to lysine, giving KFLIVKS-Ci2b insert-KFLIVKS. In a third 
peptide, the lys of each dimerizer was mutated to glu, giving EFLIVES-Ci2b insert-EFLIVES. Each 
peptide had a CD spectrum resembling that of the control peptide of EFLIVKS-Ci2b insert-EFLIVKS 
(data not shown). In a second set of mutations, the hydrophobic character of the dimerizer was 
changed. First, F2 and 14 in both EFLIVKS sequences were mutated to lysine or to serine, giving a 

30 dimerizer sequence on each terminus of the Ci2b >nsert of EKLKVKS or ESLSVKS. This resulted in a 
major change in the CD spectrum, with the appearance of a large negative band at 202-205 nm, 
indicating a significant increase in random coil structure, or denaturation (data not shown). 
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Second, only 14 was mutated to lys.ne in each dimerizer. This also changed the CD spectrum in a similar' 
fashion (data not shown). This construct had at most 1-2 slowly exchanging protons, suggesting that this 
single change in the hydrophobic core of the EFLIVKS sequence was sufficient to disrupt the structure of 
the entire peptide construct. 
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CLAIMS 

We claim: 

1 . A composition comprising at least a first dimerization peptide comprising the sequence NH 2 -X r X 2 -X 3 - 
X 4 -X 5 -COOH that is no more than 8 amino acids long and wherein X 1t X 2 , X 3 , and X 4 are selected from 

5 the group consisting of amino acids A, V, I, L, W, F, M and Y and X 5 is selected from the group 
consisting of K, R, D and E. 

2. A composition according to claim 1, further comprising a second dimerization peptide comprising the 
sequence NH 2 -X 1 -X 2 -X 3 -X 4 -X 5 -COOH that is no more than 8 amino acids long and wherein X lf X 2 , X 3 , 
and X 4 are selected from the group consisting of amino acids A, V, I, L, W, F, M and Y and X 5 is 

10 selected from the group consisting of K, R, D and E. 

3. A composition according to claims 1 or 2, wherein at least said first dimerization peptide comprises 
the sequence NH 2 -FLIVK-COOH. 

4. A composition according to claims 1 ,2 or 3, wherein at least said first dimerization peptide comprises 
the sequence NH 2 -X 0 FLIVX 5 -COOH, whererin X 0 and X 5 are selected from the group consisting of amino 

15 acids E, D, K, and R. 

5. A composition according to claims 1 , 2, 3 or 4, further comprising a first protein, wherein at least said 
first dimerization peptide is covalently joined to said first protein forming a first fusion protein. 

6. A composition according to claim 5; further comprising a second dimerization peptide covalently 
joined to said fusion protein, wherein said second dimerization peptide comprises the sequence NH 2 -X r 

20 X 2 -X 3 -X 4 -X 5 -COOH that is no more than 8 amino acids long and wherein X 1t X 2 , X 3 , and X 4 are selected 
from the group consisting of amino acids A, V, I, L, W, F, M and Y and X 5 is selected from the group 
consisting of K, R, D and E. 

7. A composition according to claim 6, wherein said first dimerization peptide is joined to the N-terminus 
of said first protein and said second dimerization peptide is joined to the C-terminus of said first protein. 

25 8. A composition according to claim 6, wherein at least one of said first dimerization peptide or said 
second dimerization peptide is covalently joined to an internal position of said protein. 
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9. A composition according to claim 5. 6. 7 or 8, wherein at least one of said first dimerization peptide 
said second dimerization peptide is covalently joined to said first protein via a linker. 

10. A composition according to claims 1, 2, 3, 4. 5, 6, 7, 8 or 9, further comprising a fusion partner. 

1 1 . A molecular library comprising a plurality of members each comprising a composition according to 
claims 5, 6, 7, 8, 9 or 10, wherein each of said members comprises a different first protein. 

12. A recombinant nucleic acid encoding the composition of claims 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or 11. 

13. An expression vector comprising the recombinant nucleic acid of claim 12. 

14. A host cell comprising the recombinant nucleic acid of claim 12. 

15. A method of producing the composition of claims 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or 1 1 comprising: 

a) providing a cell according to claim 14; and 

b) subjecting said cell to conditions, whereby said composition is expressed. 

16. A method for screening for compositions capable of altering the phenotype of a cell, said method 
comprising: 

a) introducing a recombinant nucleic acid according to claim 12 into a plurality of cells; 

b) subjecting said plurality of cells to conditions whereby protein encoded by said nucleic acid is 
expressed; 

c) screening said plurality of cells for a cell exhibiting an altered phenotype. 

17. A method according to claim 16, further comprising isolating said cell exhibiting an altered 
phenotype. 

18. A method according to claim 16, further comprising isolating a nucleic acid from said cell. 

19. A method according to claim 16, further comprising isolating a target molecule. 
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(54) Title: PEPTIDES CAUSING FORMATION OF COMPACT STRUCTURES 
(57) Abstract 

The present invention is directed to compositions and methods comprising peptides which have a high affinity for each other and 
when linked to a protein, are used to help fold the protein into a compact structure. By virtue of its stability and constraints, this scaffold 
can prolong the activity of any embedded protein sequences in the presence of cellular and other proteases. The compact structure can 
have other functional sequences embedded, and is preferable to linear and less constrained peptides for library screening for creating 
structurally-biased peptide libraries and for targeting to specific intracellular and extracellular compartments. Compositions of the present 
invention can be displayed on the surface of viruses, archaebacteria, prokaryotic and eukaryotic cells for library screening drug screening 
and display Methods of the present invention are useful for screening in vivo for intracellular effector proteins modulating signaling 
pathways and to identify interacting proteins in vitro. Thus, the present invention is useful as a scaffold for gene therapy, for the isolation 
of new therapeutic drug leads and for potential use as a therapeutic in physiological fluids. 
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